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ABSTRACT 

An  analysis  o-f  the  MacPitts  silicon  compiler  is 
presented.  The  emphasis  o-f  the  analysis  is  on  the 
interrelationship  between  algorithmic  syntax  and  resulting 
circuit  structure.  Errors  inherent  to  the  silicon  compiler 
are  investigated,  and  corrections  to  the  errors  are 
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I.    INTRODUCTION 

The  purpose  o-f  .  silicon  compilation  is  to  allow  -faster 
design  o-f  integrated  circuits.  Silicon  compilation  trees  the 
designer  -from  the  basic  layout,  routing,  and  circuitry 
concerns  inherent  to  integrated  circuit  design.  The  MacPitts 
silicon  compiler  does  this  by  designing  an  integrated 
circuit  chip  -from  a  behavioral  specification  input. 

Previous  work  at  the  Naval  Postgraduate  School 
investigated  applications  o-f  the  MacPitts  silicon  compiler 
to  design  o-f  pipelined  digital  adders  CRef.  13  and 
multipliers  CRe-f.  2D.  Work  by  Froede  CRe-f.  3 J  showed  the 
limitations  o-f  MacPitts,  in  its  inability  to  produce  fast 
VLSI  chips.  This  deficiency  is  due  primarily  to  the  layout 
scheme  (circuit  structure)  which  MacPitts  uses. 

This  thesis  investigates  the  interrelationship  between 
MacPitts  algorithmic  syntax  and  resulting  circuit  structure, 
MacPitts  partitions  the  chip  functionally  as  shown  in  Figure 
1.1.  The  data  path  is  at  the  top,  and  performs  numerical 
operations  and  combinational  logic  tests.  The  control  path 
i s  at  the  bottom,  and  performs  decisions  which  direct  data 
path  operations. 

Chapter  II  considers  combinational  logic  in  both  the 
data  path  and  control  path.  The  effects  of  syntax  on 
combinational       loqic     structures     are  investigated 


GND 

1       1 
3-PHASE  CLK 
1       1 

1       1 

I/O  PADS 

1       1 

DATA  PATH 

CONTROL  PATH 

VDD 

1       1 

I/O  PAD9 

1       I 

Figure  1..  1  MacPitts  Chip  Functional  Block  Diagram 


qualitatively,  and  ine-f-f  iciencies  and  limitations  of 
implementation  are  noted.  The  basic  data  path  organelles 
(•fundamental  combinational  logic  structures)  are  also 
invest i  gated . 

Chapter  III  is  a  quantitative  treatment  of  -functionally 
equivalent  circuits  in  the  data  path  and  control  path.  A 
five-input  AND  gate  is  created  in  both  the  data  path  and 
thecontrol  path,  and  a  comparative  analysis  is  performed. 
The  results  are  extended  to  similar  data  path  combinational 
logic  structures. 

Chapter  IV  investigates  MacPitts  sequential  logic.  A 
Gray  code-to-binary  serial  decoder  is  designed,  and  a 
functional  analysis  is  performed.  The  relationship  between 
syntax  and  circuit  structure  is  emphasized,  with  an 
alternate  solution  considered.  A  blackjack  game  chip  is 
presented  as  a  more  elaborate  MacPitts  finite  state  machine 
(FSM) ,  and  its  structure  is  contrasted  to  that  of  the  Gray 
code  decoder.  The  Mead-Conway  hi ghway-f armroad  traffic  light 
controller  CRef.  4: p. 81  D  problem  is  solved  with  a 
MacPitts  design,  and  an  alternate  solution  is  offered. 

Chapter  Vis  a  quantitative  comparison  of  a  MacPitts 
design  with  a  handcrafted  equivalent.  The  Mead-Conway 
traffic  light  controller  design  from  Chapter  IV  is  compared 
to  a  computer-aided  engineering  (CAE ) -desi gned  variant, 
which  has  a  programmed  logic  array  (PLA)  FSM.  The  designs 
are    compared  for  speed,  size,  and  power  comsumption. 


Chapter  VI  is  a  design  example.  A  design  cycle  -for 
MacPitts  is  developed,  and  illustrated  with  the  Hamming  15/4 
error  detector /corrector  CRe-f.  53.  The  prototype  (-first 
model)  and  archetype  (chie-f  model)  algorithms  and  chip 
layouts  are  provided.  An  analysis  o-f  the  alternate  designs 
is  given,  and  a  basis  -for  choosing  the  archetype  is 
proposed.  The  Hamming  15/4  error  detector /corrector  is  then 
designed  based  on  the  archetype,  and  analyzed  with  available 
CAD  tools. 

Chapter  VII  is  a  summary  o-f  errors  detected  in  the 
MacPitts  silicon  compiler  and  suggestions  -for  enhancement. 
The  errors  and  suggestions  are  cross-re-f erenced  to  MacPitts 
source  code  where  possible. 
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I  I •   COMBINATIONAL  LOGIC  STRUCTURES  IN  THE  NACPITTS 

SILICON  COMPILER 

Inasmuch  as  the  MacPitts  algorithm  creates  combinational 
logic  -functions,  it  would  be  helpful  to  know  how  it  does 
this.  Does  there  exist  an  explicit  directive  to  the  LISP 
object  -file  which  calls  and  implements  the  logical  functinris 
requested,  or  are  they  implicitly  specified?  It  the  latter 
is  true,  it  would  suggest  simpler  source  algorithms  could  be 
written  to  specify  the  circuit,  function.  I-f  the  -former  case 
is  true,  then  more  lengthy  algorithms  are  required,  but  the 
circuit  designer  has  more  latitude  -for  direct  control  and 
optimization  o-f  layout. 

A.   COMBINATIONAL  LOGIC  CIRCUITS  IN  THE  DATA  PATH 

Combinational  logic  structure  instantiation  in  the  data 
path  o-f  a  MacPitts  generated  chip  is  directed  by  the  data- 
path..lisp  -file  in  the  MacPitts  source  code.  The  data- 
path, lisp  tile  calls  specific  functional  units  called 
organelles  from  the  organelles. lisp  file  to  implement  the 
desired  logic.  These  LISP  files  are:  compiled  under  the  Liszt 
compiler  and  linked  to  the  rest  of  the  compiled  MacPitts 
files  by  the  available  Makefile  routine.  The  resulting  1  ,.  6 
Megabyte  binary  image  constitutes  the  integrated  MacPitts 
silicon  compiler. 


!.. 


1 .    The  Sasi c  Chi  p  Frame 

The    initial    investigation   consisted    of    the 
MacPi tts-generated    design   frame   called    wire. mac.    The 

algorithm  to  create  this  structure  is  shown  in  Figure  2.1. 

WIRE. MAC 

SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  NO 
FUNCTION  BY  MACPITTS  SILICON  COMPILER 
(program   wire  I 

(def  I  ground) 

(def  aln  port  Input  (2)) 

(def  res  port  output  (3>> 
(def  4  phta) 
(def  5  phlb) 
(def  6  phtc) 

( def  7  power ) 
( a lways 

( setq  res     a  1  n  )  )  )  ) 

Figure  2.1  Wire. mac 


The  extension  .mac  refers  to  a  MacPitts  algorithm.  MacPitts 
is  taken  to  refer  to  the  silicon  compiler,  the  psuedo-LISP 
language  which  it  uses,  and  the  LISP  source  routines  which 
constitute  the  silicon  compiler.  To  avoid  con-fusion,  the 
MacPitts  driver  routines  written  by  the  chip  designer  will 
be  referred  to  as  algorithms.  Other  meanings  of  the  term 
MacPitts  will  be  clarified  by  context. 

MacPitts  produces  a  seven  pad  chip,  routing  the 
input  directly  to  the  output  without  clocking.  The  three 
phase  clocking  is  not  required  for  this  circuit,  so  the 
clock  runs  all  terminate  within  the  chip  frame  without 
connections  as  shown  in  Figure  2.2.  The  three  phase  clock 
must  be  specified  in  the  algorithm,  however,  and  the  clock 
traces  are  produced  whether  they  are  used  or  not.  Note  that 
the  pads  are    placed  around  only  three  sides  of  the  chip, 
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Figure    2.2       Wire.ci-f 
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and  the  clock  pads  are  also  placed  in  the  order  specified  in 
the  driver  algorithm  (Figure  2.1).  Furthermore,  neither  the 
clock  traces  nor  the  signal  lines  takes  a  direct  route  to 
its  destination.  Even  though  these  lines  are  all  metal,  the 
excess  lengths  induce  a  lessening  of  maximum  chip  speed  due 
to  capacitance.  This  topic  will  be  treated  in  a  later 
chapter.  The  data  path  Vdd-ground  comb  does  not  connect  with 
the  Vdd  rail  at  bottom  left  on  the  stipple  plot.  This  is 
common  with  very  small  data  path  chips,  and  the  error  can  be 
corrected  in  Caesar  or  a  similar  VLSI  graphics  editor. 
2.    A  Data  Path  Inverter 

The  next  program,  macnot.mac  shown  in  Figure  2.3, 
specified  a  logical  NOT  function.  As  expected,  MacPitts  used 
a  single  inverter  of  4:1  ratio  in  the  data  path.  The  input 
which  is  on  the  top  left  diffusion  line  in  Figure  2.4  runs 
to  the  gate  of  the  NMOS  inverter  via  a  metal  and  diffusion 
routing,  and  the  inverted  output  comes  out  on  a  polysilicon 
line  from  the  far  right  of  the  circuit.  It  was  also  noted 
that  the  logical  integer  specification  is  required  for  NOT, 
i.e.  ,  one  must  use  [word-not]  rather  than  CnotD.  The  reason 
for  this  is  given  in  Southard  CRef.  6:pp.  47-48],  which 
indicates  that  integer  logical  operators  must  be  used  on 
word  elements,  (ports  and  registers),  and  Boolean  logical 
operators  on  control  elements  (flags  and  signals).  The 
logical  Boolean  specification  CnotD  is  used  on  flags,  input 
signals,   and   internal  signals  but  it  is  not  used  for  input 

1  5 


ports  or  register  contents.  In  either  Boolean  or  integer 
data  types,  the  NOT  function  takes  a  single  value,  as  would 
be  expected.  : 

The  syntax  of  the  driver  algorithm  (the  .mac  -file) 
is  data-type  sensitive,  in  a  similar  manner  as  Fortran  is 
sensitive  to  the  integer  and  -floating  point  data  types.  The 
two  data  types  (-from  the  programming  perspective)  are 
Boolean  and  integer.  Each  data  type  is  treated  di  f  f  erent  1  y 
by  the  MacPitts  compiler,  and  each  requires  a  different 
syntax  -for  the  equivalent  -function.  An  example  will  clarify 
this  distinction: 


FUNCTION 

DATA  TY 

NOT 

Bool ean 

NOT 

1 nteger 

AND 

Bool ean 

AND 

i  nteger 

ALGORITHMIC  STATEMENT 

(not  a) 
(word-not  a) 
(and  a  b) 
(word-and  a  b) 


The  fundamental  difference  in  data  types  is 
argument  length.  Boolean  data  are  of  single  bit  length, 
whereas  integer  data  are  of  word  length  (one  bit  or 
greater).  Integer  type  data  operations  all  occur  in  the 
data  path  of  a  MacPitts  design,  and  Boolean  operations  all 
occur  in  the  control  path. 

In   Figure   2.3,   the  data  type  is  declared  in   the 

DEF  statement,  the  form  of  which  is 

(def  (name)  <function>  < input,   output,   or  internal 
< p  i  n  number ( s)  >) 


;MACNOT.MAC 

;SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 
;<not>   FUNCTION  BY  MACpITTS  SILICON  COMPILER 
(program  macnot  1 

(def  1  ground) 

(def  a  port   input  (2)) 

;aln*lnput//res»output 

(def  b  port   output  (3)) 

(def  4  phla) 

<def  5  phlb ) 

(def  6  ph 1c  > 

;must  show  3-phs  elk, oven  If  not  used 
( def  7  power  > 
( a  1 ways 

( setq  b   (word-not  a))))       i 

Figure  2.3  Macnot. mac 
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Figure  2.4  Data  Fath  Inverter 
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where  the  name  is  any  ASCII  character  string,  the  -Function 
can  be  either  port,  signal,  register,  or  -flag.  The  next 
•field  determines  where  the  data  is  applied,  and  for  most 
circuits  is  either  input  or  output.  The  pin  number  is 
required  -for  all  input  and  output  data.  The  data  type  is 
determined  by  the  function  field.  Signals  and  flags  a.r& 
Boolean  data,  ports  and  registers  are  integer  (word  length) 
data.  The  subsequent  MacPitts  -forms  in  the  driver  algorithm 
must  agree  in  type  with  the  DEF  declarations. 

I-f  an  incorrect  data  type  specification  is  used, 
MacPitts  generates  an  appropriate  error  diagnostic  at 
compilation  time.  For  instance,  it  one  were  to  define  the 
inputs  hot  and  cold  as  Boolean  type  and  attempt  integer 
operations  on  them  as  -follows 


(de-f  hot  signal  input  b) 
(def  cold  signal  input  6) 


(setq  warm  (word— nor  hot  cold)) 


the  following  diagnostic  would  result  at  compilation  time; 


brror : 1 ogi cal 
yet 


coercion   to   integer   not   implemented 


Si  mil arly,   if   Boolean  operations  ^re    attempted  on   integer 
data,  the  following  diagnostic  results  at  compilation  time: 
Error : Bool ean  conversion  not  implemented  yet 


MacPitts  error  diagnostics  can  be  quite  con-fusing 
to  the  inexperienced  user.  It  is  suggested  that  one  peruse 
the  1  i  ncol  n  .  1  i  sp  ,  hl.grep,  and  compmesg  .  1  i  sp  files  of  the 
MacPitts  source  code  to  gain  insight  into  the  cause  o-f 
specific  diagnostic  messages. Thi s  can  be  easily  done  on-line 
under  the  BSD  Unix  operating  system.  The  grep  -feature 
(pattern  search  and  recognition)  is  used.  The  general. 
command  -format  is 

grep  <  search  pattern  >  <file  to  search  >. 

For  example,  i -f  one  attempted  Boolean  operations  on  a 
register  (an  integer-valued  data  type)  in  MacPitts,  the 
second  diagnostic  given  above  would  result.  To  loc£*.te  the 
source  o-f  this  message,  change  directory  to  the  residence  of 
MacPitts  source  code  and  issue  the  Unix  command 
grep  boolean  •*.*■ 


to  .  locate  all  occurrences  o-f  the  word  boolean.  Caution  is 
advised  in  issuing  the  grep  command.  I-f  a  very  common  word 
is  searched  tor,  the  search  may  take  quite  a  long  while,  and 
the  results  may  not  be  very  helpful.  The  search  capability 
of  the  grep  command  is  limited  though,  as  explained  in  the 
BSD  Unix  manual. 

3.    A  Data  Path  OR  Gate 

Next  a  MacPitts  routine  was  written  to  generate  a 
two  input  OR  gate  in  the  data  path.  Again,  the  integer  data 
specification  is  required  (see  Figure  2.5). 


i    ;MACOR.MAC 

;SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<or>   FUNCTION  BY  MACPITTS  SILICON  COMPILER//2  Input  gate// 

(program  macor  1 

(def  1  ground) 

(def  a  port   Input  (2   )> 

;a,ba1nputs//c»output 

(def  b  port  Input  ( 3 ) ) 

(def  c  port   output  (4  )> 

(def  5  ph1a> 

(def  6  ph1b> 

(def  7  phlc) 

<  def  8  power ) 

( a lways 

(setq  c  (  word-or   a    b   )   )   )   )  ■ 

Figure  2.5  Macor. mac 


The  resulting  circuit  extracted  -from  the  chip  is  depicted 
in  Figure  2.6.  The  OR  function  is  implemented  as  a  NOR  gate 
-followed  by  an  inverter.  Figure  2.7  shows  the  gate 
equivalent  o-f  a  two  input  data  path  OR  structure.  The  two 
inputs  to  the  NOR  gate  come  in  on  the  le-ft  top  o-f  the 
circuit,  the  output  is  then  inverted  to  yield  a  logical  OR 
■function,  and  the  output  o-f  the  inverter  is  routed  -from  the 
le-ft  back  out  on  the  poly  line  below  and  parallel  to  the 
input  tracks.  This  routing  scheme  (river  routing)  is 
determined  by  the  MacF'itts  source  code,  and  the  chip 
designer  has  no  control  over  it.  All  chip  inputs  and  outputs 
are  routed  inside  the  main  ground  bus,  with  little  regard  to 
minimizing  trace  length  (see  Figure  2.2).  So  an  OR  gate  in 
the  data  path  o-f  MacPitts  is  constructed  from  a  two  input 
NOR  gate  with  an  inverter  on  the  output,  and  the  inputs  and 
outputs  all  connect  the  data  path  -from  the  le-ft  side. 
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Figure  2.6  Data  Path  OR  Gate 


y> 


A  +  B 


Figure  2.7  Gate  Equivalent  o-f  Figure  2.6 
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4.    A  Data  Path  NOR  Gate 

A  two  input  data  path  NOR  -function  is  shown  in 
Figure  2.8.  The  resulting  circuit  in  Figure  2.9  shows 
instantiation  as  a  two  input  8:1  NOR  gate,  with  the  inputs 
A,  B,  at  top  left  and  the  result,  C,  at  bottom  left.  If  two 
inputs  sire  permissible,  Bre  more?  Does  MacPitts  know  to 
adjust  the  transistor  k  values  for  multiple  input  gates?  A 
two  input  NOR  chip  was  specified  in  the  algorithm,  and 
MacPitts  created  a  two  input  NOR  gate.  So  explicit  circuit 
specification  has  been  realized  so  far  in  the  MacPitts  chip 
data  path.  When  the  algorithm  specifies  a  NOR  function,  a 
NOR  gate  is  instantiated.  As  will  be  discussed  later,  this 
is  not  the  case  in  the  control  path. 


;MACNOR.MAC 

;SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<nor>   FUNCTION  BY  MACPITTS  SILICON  C0MPILER//2  Input  gate// 

(program  macnor  1 

(def  1  ground) 

I def    a    port       Input    (2        )  ) 

;a,b*lnputs//c*output 

(def    b    port     Input    { 3  )  > 

(def    c    port       output    (4     )) 

(def    5    phla) 

(def    6    ph1b> 

(def    7    ph 1c  > 

;must    show    3-phs    elk, even     If    not    used 
( def    8    power  )  ; 

( a  1  ways 
( setq    c     (    word-nor       a         b       )        )        )        )  ) 


Figure    2.8    Macnor. mac 


Figure  2.9  Data  Path  NOR  Gate 


5.    A  Four  Input  NOR  Structure  In  The  Data  Path 

Figure  2.10  shows  the  MacPitts  algorithm  to 
generate  a  -four  input  NOR  structure  (not  the  -functional 
equivalent  o-f  a  -four  input  NOR  gate)  in  the  data  path.  The 
MacPitts  -form  used  was 


(setq  out (word-nor  a (word-nor  b (word-nor  c  d))) 
where   setq   is   the  LISP  assignment  operator,   out   is   the 
output  port,   a,b,c,and  are    the  inputs,   and  all  data  is   o-f 


integer  (word)  type.  The  pre-f  i  x-operator  nature  o-f  LISP 
syntax  CRe-f.  6:p.  47D  indicates  the  logical  operation  which 
this  gate  will  perform.  Figure  2.11  shows  the  layout  o-f  the 
circuit  MacPitts  produces  -from  this  algorithm,  and  Figure 
2.12  depicts  the  gate-level  equivalent. 

Note  the  topology,  two  inputs  to  the  -first  NOR 
gate,  its  output  and  another  input  to  the  next  NOR  gate  and 
repetition  to  the  third  level.  The  output  comes  from  the 
last  (rightmost)  NOR  gate. 

This  structure  wi  1  1  not  be  the  -functional 
equivalent  o-f  a  -four  input  NOR  gate.  As  the  LISP— like  syntax 
suggests,  the  NOR  o-f  tour  inputs  is  not  equivalent  to  the 
cascading  o-f  two  input  NORs. 


; FOUR  NOR. MAC 

;SOURCE  CODE  FOR  ALGORITHMIC  CREATION 

;<nor>   STRUCTURE  BY  MACPITTS  SILICON 

( program  f 1 vnor  1 

(def  1  ground) 
(def  a  port  Input  (2  )  ) 
(def  b  port  Input  (3)  ) 
(def  c  port  Input  (4)) 
(def  d  port  Input  (5)  ) 
(def  e  port  Input  (6)) 
(def  outr  port  output  (7)) 
(def  8  phla) 
(def  9  phlb) 
(def  10  phlc  ) 
( def  1 1  power ) 
( a  1  ways 

(setq  outr 
( word-nor 


OF  LOGICAL 
COMPILER//* 


Inputs// 


a(word-nor    b(word-nor    c    d))>))) 


Figure    2.10    Four nor. mac 
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Figure    2.11    Data    Path    Fournor    Circuit 
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Figure  2.12  Gate  Equivalent  of  Fournor  Circuitry 


6.    A  Data  Path  AND  Gate 

These  observations  raise  the  question  o-f  how  a  two 
input  data  path  AND  gate  would  be  constructed  by  MacPitts. 
The  (word-and  x  y)  integer  expression  is  required  to 
implement  this  circuit  al gar i thmi cal 1 y ,  and  a  reasonably 
compact  circuit  is  expected.  Figure  2.13  shows  the  MacPitts 
algorithm  to  create  the  two  input  bit  AND  -function  in  the 
data  path. 

;MACAND.MAC 

;SOURCE    CODE    FOR    ALGORITHMIC    CREATION    OF    LOGICAL 

;<AND>       FUNCTION    BY    MACPITTS    SILICON    C0MPILER//2     Input    gate// 

(program    macand     1 

(def     1    ground) 

(def    a    port       Input    (2       )) 

(def    b    port     input    (3  > ) 

(def    c    port       output     (4     )) 

(def    5    phla) 

(def    6    phlb) 

(def    7    phfc) 

( def    8    power ) 

( a  1  ways 

( setq    c    (    word-and       a         b       )■      )        )        ) 

Figure    2.13    Macand. mac 


The    AND    chip    is    implemented    as    a    two    input    4:1    NAND 
gate,       the       output       o-f    which    drives       a  4:1       inverter.        The 

stipple       plot    o-f    this    circuit    is    shown    in    Figure    2.14,  and 

its       gate       level     equivalent    is    shown       in       Figure       2.15.  In 

Figure  2.14  note  the  input  similarities  to  the  previous 
circuits.  The  two  inputs  enter  the  organelle  at  top  left, 
the  signal  is  routed  to  the  gate,  and  the  output  exits  the 
organelle    on    the    bottom    polysilicon    line    at    the    le-ft.     Also 
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Figure  2.14  Data  Path  AND  Gate 
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Figure  2.15  Gate  Equivalent  o-f  Data  Path  AND  Gate 
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note  the  difference  among  layouts  o-f  the  MacPitts  NAND  gate 
and  the  MacPitts  NOR  gate,  and  the  corresponding  Mead-Conway 
cells  CRe-f.  4:p.  173. 

7.  A  Three  Input  AND  Structure  In  The  Data  Path 

The  three  input  AND  was  expected  to  produce  gates 
similar  to  those  o-f  the  two  input  AND,  a  series  o-f  cascaded 
NAND  gates  each  -followed  by  an  inverter.  Figure  2.16  shows 
the  algorithm  -for  the  three  input  AND  circuit,  and  Figure 
2.17  depicts  the  resulting  layout.  The  circuit  is  the 
equivalent  of  three  ANDs  due  to  associativity  o-f  AND. 

8.  Data  Path  Basi  c  Orqanel les 

Nhen  a  MacPitts  source  algorithm  is  invoked  by  the 
linked  binary  MacPitts  image  by  issuing  the  command 

macpitts  <filename>  <options> 

LISP  object  code  is  generated  (unless  the  noobj  option  is 
specified,  in  which  case  MacPitts  searches  for  a  previously- 
created  object  file  of  <f i 1 ename>. ob j ) .  In  the  filename. obj 
file  it  is  observed  that  the  data  path  logical  operations 
Ar&  all  derived  from  NOT,  NAND,  and  NOR  LISP  operations. 
This  is  due  to  the  -fundamental  hardware  building  blocks  of 
MacPitts  data  path  combinational  logic  being  two  input  NAND 
and  NOR  gates,  and  NOT  gates  (inverters).  Knowing  this,  the 
reason  -for  the  two-input  gate  implementation  as  depicted  in 
the  previous  -figures  becomes  clear. 
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Any  data  path  logic  organelle  is  composed  of  these 
primitives.  The  OR  organelle  is  a  NOR  gate  with  an  inverter 
on  its  output.  •  The  AND  organelle  is  a  NAND  gate  with  an 
inverter  on  its  output.  In  the  data  path,  these  organelles 
Are  assembled  into  macros  in  the  organelles. lisp  tile  of  the 
MacPitts  source  code.  The  process  of  silicon  compilation  is 
thereby  shortened,  since  some  o-f  the  constituent  parts  a.rs 
already  put  together. 

A  two  input  data  path  NAND  gate  chip  is  implemented 
exactly  as  it  is  specified.  A  three  input  NAND  structure  is 
implemented  as  expected,  by  cascading  two  NAND  orgeneIi.es 
(the  three  input  NAND  structure  is  not  functionally 
equivalent  to  a  three  input  NAND  gate).  The  output,  again, 
is  what  the  LISP  parenthesized  notation  would  lead  one  to 
expect . 

9.    Bi  t  SI i  ce  Combi nat i anal  Loqi c 

So  -far,  all  examples  given  have  used  inputs  having 
one  bit, but  the  data  type  specification  tor  data  path 
combinational  logic  is  integer.  Word  size  data  inputs  B.rs 
treated  in  the  expected  way.  Figure  2.16  illustrates  a 
routine  which  performs  the  logical  AND  on  two  input  vectors 
each  four  bits  wide.  Notice  the  similarity  of  this  MacPitts 
program  to  those  already  given.  The  only  differences  between 
this  routine  and  the  AND  of  two  bits  ana  the  PORT 
statements,  which  make  logical  and  connective  assignments 
between  i /o  ports  and  inter— chip  hardware  blocks. 


;3AND.MAC 
;SOURCE  CODE 


FOR  3  INPUT  DATA  PATH  <AND>  GATE 


(program    3and       1 

(def 

1    ground  > 

<def 

a    port       Input    (2) ) 

(def 

b    port       Input    ( 3 ) ) 

(def 

c    port       Input    (  4  >  ) 

(def 

d    port      output    ( 5 ) ) 

(def 

6    ph  ta  ) 

(def 

7    phlb) 

(def 

8    phlc) 

(def 

9    power  > 

( a  1  ways 

( setq    d    (word-and    (word-and    a 

1 

-igure    2.16       3and.mac 

b) 


)  >)  > 


Figure  2.17  Circuitry  from  3and.ci-f 


Figure  2.18  illustrates  the  data  path  circuitry 
which  implements  this  logic.  It  is  evident  that  the  logic  is 
performed  by  replications  o-f  the  -fundamental  MacPitts  AND 
organelle,  a  NAND  gate  with  inverted  output.  In  comparing 
this  circuit  to  Figure  2.14  the  similarity  becomes  clear. 
The  word-and  integer  operation  as  specified  in  the  source 
algorithm  translates  to  a  data  path  AND  organelle  in  the 
LISP  object  -file.  This  organelle  is  replicated, 
instantiated,  and  connected  to  inputs  and  outputs  to  create 
the  circuit  (cifplot)  shown  in  Figure  2.19.  This  data,  path 
word  operation  capability  would  not  usually  be  applied  to 
bit-width  combinational  logic,  as  the  previous  discussions 
might  suggest,  but  rather  to  bit— slice  operations  such  as 
word  masking,  parity  checks,  arithmetic  operations,  and  so 
on  . 

10.   Two  Data  Path  Chi  ps:  Counters 

A  -four  bit  resettable  up— counter  chip  was  designed 
by  MacPitts  using  an  algorithm  given  in  the  MacPitts 
documentation.  Figure  2.20  shows  the  algorithm  to  specify 
the  counter's  behavior,  and  Figure  2.21  shows  the  resulting 
chip  layout  diagram.  This  example  gives  an  indication  of  the 
implicative  nature  o-f  MacPitts,  which  is  actually  a  function 
of  the  LISP  object  code.  There  is  a  bank  of  three  vertical 
drivers  below  the  data  path  block  in  Figure  2.21.  These  are 
clock  drivers,  which  drive  the  three  phase  clock. 


;MULTIAND.MAC 

{SOURCE  CODE  FOR  ALGORITHMIC 
;<AND>  FUNCTION  BY  MACPITTS 
(program  multland  4 

(def  1  ground) 

a  port   Input  (2  3  4  5>) 
b  port  tnput  (6  7  8  9) ) 
c  port   output  (10  11  12 
14  phla  ) 
phlb) 
phlc  ) 
power ) 


(def 

(def 

(def 

(def 

(def 

(def 

(def 

( a  1 ways 

(setq  c 
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Figure  2.18  Mul t i and. mac 
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Figure  2.19  Logic  Circuitry  From  Mul t l and . ci f 


{Example  of  MACP1TTS  algorithm  to  create  a  4  bit  counter 

{Illustrates  use  of  "always"  and  "cond"  commands 

{title:  count*. mac 

(program  count4   4 

( def  1 6  power ) 

( def  1  ground  ) 

(def  2  phla) 

(def  3  phlb) 

(def  4  ph  1c  ) 

(def  rst  signal  Input  5) 

(def  count  register) 

(def  cnt_up  signal  Input  6) 

(def  ld_zero  signal  Input  7) 

(def  out  port  output  (12  13  14  15)  *) 

( a  1  ways 

(  cond 

( ld_zero 

( setq  count  0)  ) 
( cnt_up 

(setq  count  (1+  count))   )     ) 

(setq  out  count)  )    ) 


Figure  2.20  Count4.mac 

They   connect   to  the  clock  lines  on  the  bottom  and   to   the 
count  registers  at  the  top. 

There  is  a  small  Weinberger  Array  beneath  the  clock 
drivers.  A  Weinberger  array  CRef.  8]  is  used  by  MacPitts  to 
control  data  path  operations.  It  can  be  inferred  -from  the 
size  comparison  between  the  data  path  block  and  the  control 
block  that  this  is  a  data  intensive  chip.  The  MacPitts 
algorithm  reflects  this,  with  many  data  operations  such  as 
SETQ  and  (1+  count)  ,  the  increment  statement,  and  -few  control 
operations  such  as 

(  cond (  <  conditional  >  <actions>  ...   ) 
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Figure    2.21    Count4.ci-f 
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where  each  conditional  >  requires  a  decision.  This 
decision  making  is  perhaps  more  obvious  in  the  generated 
object  -file,  where  each  COND  statement  is  translated  to  an 
IF  statement.  MacPitts  implements  the  decisions  more  along 
the  lines  o-f  a  Pascal  CASE  construction  than  as  an  IF 
construction  (the  compiled  LISP  code  re-flects  the  IF  logical 
testing,  but  it  is  set  within  a  parallelizing  command). 

The  SETQ  form  has  operated  on  just  ports  so  tar.  In 
count4.mac,  the  SETQ  -form  operates  on  a  register  (COUNT,  the 
current  counter  value).  The  last  line  in  the  algorithm, 
(setq  out  count) ,  sets  the  output  port  to  the  current  count 
register  value.  From  the  hardware  perspective,  this  can  be 
viewed  as  a  latching  or  storage  o-f  the  register  contents., 
and  clocking  the  contents  to  an  output  port.  This  is 
necessary  in  MacPitts  since  ports  cannot  store  data..  Only 
registers  can  store  data  in  the  data  path,  and  MacPitts 
implements  registers  as  master— slave  flip  flops. 

The  chips  considered  so  far,  with  the  exception  of 
count4.mac,  have  been  pure  data  path  chips.  In  almost  all 
useful  chips,  there  will  be  a  data  path  which  is  control  lea 
by  a  Weinberger  array  control  path.  It  is  difficult  to  guess 
the  relative  sizes  of  the  data  path  and  control  path  from 
just  the  MacPitts  driver  algorithm.  Nevertheless,  it  few 
conditional  decisions  s,rG  to  be  made  and  many  arithmetic  or 
logical  operations  are  to  be  performed,  the  data  path  is 
likely  to  be  the  larger. 


Figure  2.22  shows  the  algorithm  (the  .mac  tile) 
■for  count  16ud.  mac ,  the  MacPitts  driver  -for  a  16  bit  up/down 
counter.  The  signal  and  register  names  Are  self  explanatory. 
The  previous  -four  bit  up-counter  was  the  prototype  -for  this 
16  bit  up/down  counter.  The  differences  are  in  word  length, 
the  addition  of  a  new  input  signal  (count_down) ,  the 
conditional  test  of  count_down,  and  the  decrement  operation 
(1-  count)  if  count_down  is  asserted  true.  It  is  usually  a 
good  idea  to  model  a  desired  algorithm  with  a  simpler 
prototype  (functionally  similar  but  having  fewer  inputs  and 
outputs) ,  and  to  test  the  prototype  in  the  MacPitts  command 
interpreter.  For  example,  designing  a  four  bit  up  counter  is 
a  good  preliminary  step  when  a  16  bit  up/down  counter  is 
desi red . 

It  can  be  inferred  that  the  ratio  of  data  path  to 
control  path  size  will  be  greater  for  this  chip  than  for 
cou.nt4.mac.  Figure  2.23  shows  the  resulting  cifplot  of 
count 16ud . mac ,  and  the  16  bit  wide  data  path  is  indeed  much 
larger  than  the  control  path,  and  as  expected,  much  larger 
than  the  four  bit  counter  data  path  also. 


;Example  of  MACPITTS  algorithm  to  create  a 
;cop1ous1y  commented  for  clarity's  sake 


16  bit  up/down  counter 


;tttlet  count  1 6ud .mac 

(program  countl6ud    18 

;note  that  the  IS  opposite  the  title  determines  #  of  outputs 

;doc.  says  data  paths;  actually  equates  to  output  pads(NOT  paths) 

•.following  5  lines  necessary  every  pgmi 

(def  1  ground) 

(def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

(def  25  power) 

;the  counter  will  require  a  16  bit  width  storage  reg Ister (McP«  m/s  FF) 

;...  a  count  up  enable  signal, 

;...  a  count  down  enable  signal,  » 

;...  and  a  reset  signal.  These  are  described  syntactically  below: 

(def  rst  signal  Input  5) 

;th1s  declares  a   bank  of  16  clocked   m/s  FFs  (see  stlppleplot) 

(def  count  register) 

(def  cnt_up  signal  input  6) 

(def  cnt_dn  signal  Input  7) 

(def  ld_zero  signal  Input  8) 

the  16  output  pads  are  specif ledt 

def  out  port  output  (9  10  11  12  13  14  15  16  17  18  19  20    21  22  23  24)  ) 

always  command  means  to  execute  what  follows  every  clock  cycle 

a lways 

the  cond  (-ftlon)  statement  means  to  check  the  following  guard 

conditions,  and  execute  ONLY  that  one  which  Is  .true. 

execution  of  one  guard  precludes  execution  of  any  subsequent  guards. 
(  cond 

there  are  three  guards  to  checktls  ld_zero  .true.? 

If  not,  Is  cnt_up  .true.? 

If  not.  Is  cnt_dn  .true.? 

If  neither  Is  .true,  then  exit  the  loop 
( 1d_zero 

If  ld_zero  1s  asserted  (high),  then  make  count»0  (1.e.,clr  FFs) 

(setq  count  S3)     ) 
( cnt_up 
; If  cnt_up  Is  asserted  (high),  then  Increment  the  count  FF  bank 

(setq  count  (1+  count))   ) 
\\f    cnt_dn  Is  asserted  (high),  then  decrement  the  count  FF  bank 
( c  n  t_d  n 

(setq  count  (1-  count))   )       ) 
;regardless  of  which  (if  any)  operation  Is  done,  the  FF  contents 
;are  assigned  to  the  output  with  the  setq  command, 
(setq  out  count)  )    ) 


Figure  2.22  Count 16ud. mac 
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Figure  2.23  Count 16ud. ci f 
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B.   COMBINATIONAL  LOGIC  STRUCTURES  IN  THE  CONTROL  PATH 

The  implementation  of  combinational  logic  in  the  control 
path  o-f  a  MacPitts  design  is  -fundamentally  different  -from 
its  implementation  in  the  data  path. 

In  the  data  path,  all  combinational  logic  is  constructed 
■from  basic  two  input  NOR,  NAND,  and  NOT  cells,  as  described 
in  the  MacPitts  source  code  file  data-path . 1 i sp .  Any 
logical  implementation,  however  complicated,  is  constructed 
from  these  three  organelles  (other  organelles  do  exist  in 
the  organelles. 1  file,  but  they  all  are  constituted  either 
from  these  basic  cells  or  permutations  of  these  cells). 

Furthermore,  the  specifications  required  by  MacPitts  in 
the  data  path  are  more  oriented  towards  structure  than 
behavior.  For  instance,  when  the  programmer /desi gner  writes 
the   following  algorithmic  fragment 

(word-and  a(word-and  b  c)) 

what  is  being  explicitly  specified  is  a  two-level  gate 
structure.  The  innermost  level  comprises  a  two-input  AND 
gate,  the  output  of  which  is  fed  to  the  input  of  the  second 
level  AND  gate,  in  parallel  with  the  third  input.  Note  that 
a  single  gate  with  more  than  two  inputs  is  not  permitted  in 
the  data  path.  The  syntax  constraints  of  the  MacPitts 
compiled  object  code  determine  this  structure.  Again,  this 
apparent   limitation   is   not   really  a   limitation   at   all 
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because  MacPitts  is  so  constructed  as  to  force  decisions  to 
be  made  in  the  control  path.  Consequently,  the  necessity  of 
Boolean  algebraic  reduction  in  the  data  path  combinational 
logic  is  highly  unlikely. 

1 .    Control  Path  Combi  nati  onal  Loqi  c 

The  control  path  implementation  o-f  combinational 
logic  is  simpler  than  the  data  path  implementation  in  two 
ways.  It  is  behavior  oriented,  rather  than  structure 
oriented.  The  MacPitts  designer  needs  only  to  specify  the 
MacPitts  LISP— like  behavior  of  the  structure,  and  the 
MacPitts  environment  produces  a  realization  of  it.  This 
requires  little  (if  any)  Boolean  reduction  which  might  be 
required  for  complicated  data  path  logical  structures. 

The  control  path  combinational  logic  is  also 
simpler  structurally,  in  that  it  is  always  implemented  in  a 
highly-regular  Weinberger  array.  A  tradeoff  between 
simplicity  of  layout  and  maximum  circuit  speed  exists, 
however,  and  this  topic  will  be  considered  in  Chapters  IV 
and  V.  Although  a  Weinberger  array  is  geometrically  simpler 
than  a  Programmable  Logic  Array  (PLA) ,  it  is  not  as  fast  or 
as  smal 1 . 

The  selection  of  which  path  is  to  perform  the 
combinational  logic  is  inherent  in  the  MacPitts  (the 
language)  syntax.  If  the  logical  operator  is  a  Boolean  form 
and  its  antecedents  are  signals  or  flags,  the  control  path 
will   do   the  logic.    If  the  logical  operator  is  an   integer 
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•form  and  its  antecedents  Are  ports  or  registers,  then  the 
combinational  logic  will  take  place  in  the  data  path.  Thus, 
the  syntax  drives  the  selection  o-f  where  the  combinational 
logic  occurs. 

The  initial  MacPitts  documentation  offered  some 
insight  into  these  distinctions.  A  variety  of  tests  were 
devised  in  the  current  investigation  to  explore  the 
combinational  logic  implementation  di  -f -f  erences  between  the 
data  and  control  paths.  The  experiments  designed  to  arrive 
at  the  above  conclusions  -for  the  control  path  logic  a.r<a 
presented  in  the  -following  sections. 

2.    A   Control   Path    AND   Gate,    And   Control   Path 
Syntax 

Casand.mac   (cascaded  AND  gates;   Figure  2.24  )  was 

the   algorithm   to  create  the  initial  structure   to   explore 

combinational  logic  implementation  in  the  control  path.    The? 

control  path  implementation  of  combinational  logic   requires 

a   different   kind  of  input  declaration  than  does   the   data 

path.  In  the  control  path,  the  inputs  must,  be  declared  as 

<name>  signal  input  <pin  number > 

This   has  the  effect  of  coercion  to  Boolean  (true  or       false, 
as  opposed  to  one  and  zero)  in  the  MacPitts  environment. 

Consequently,  a  different  type  o+  logical  operator 
is  required  in  the  SETQ  argument,  forms.  In  the  data  path, 
using   def i ned-i nteger   ports  as  inputs,   the  integer   logic 

■   f 


SETQ  -forms  are  used  (word— or,  word-nand,  etc).  In  the 
control  path,  however,  Boolean  SETQ  -forms  are  required  (or, 
nand ,  etc.).  The  data  path  integer  SETQ  -forms  are  limited  to 
two  logical  arguments,  whereas  the  control  path  SETQ  -forms 
are  e-f -f  ect  i  vel  y  unlimited  as  to  number  o-f  logical  arguments. 
This  seemingly  arbitrary  constraint  becomes  understandable 
in  view  o-f  structural  implementation  in  the  respective 
paths.  In  the  data  path,  all  logic  must  be  implemented  by 
cascades  o-f  two  input  gates.  In  the  control  path,  all  logic 
is  implemented  by  a  Weinberger  array,  which  has  no  practical. 
limit  (except  speed,  pin  count,  and  chip  size)  on  the  number 
o-f  inputs. 

Furthermore,  the  data  path  combinational  logic 
restrictions  are  less  strict  (structurally  speaking)  than 
are  the  control  path  logical  structures.  For  instance,  in 
the  data  path  all  combinational  logic  structures  are  derived 
-from  NAND,  NOR,  and  NOT  gates,  and  implemented  as  macro 
organelles.  In  the  control  path,  however,  all  logic 
structures  are  constrained  to  be  NOR  gates.  The  basename.obj 
-file  that  results  from  a  basename.mac  file  indicates  all 
control  path  combinational  logic  implemented  as  NOR 
operations.  Figure  2.25,  casand.obj,  shows  the  NOR  -function 
used  to  perform  the  AND  function  in  the  control  path.  All 
control  path  combinational  logic  operations  are  implemented 
in  this  fashion,  as  in  the  more  common  PLA. 


{( 


{CASAND. MAC 

iSOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<and>   FUNCTION  BY  MACPITTS  SILICON  COMPILER//2   Input  gate// 

(program  casand  I 

(def  I  ground) 

(def  a  signal  Input  5) 

(def  b  signal  Input  S) 

(def  c  signal  output  7) 

(def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

( def  8  power ) 

( a lways 

(cond   (a 

(setq   c  (and  a  b>   )  >  » 

(b 

(setq   c  (and  a  b)  )  )   )  )   ) 

Figure    2.24    Casand. mac 


destination    c) 

source    a) 

source   b) 

logo    casand) 

word-length    1 ) 

ground    1 ) 

signal  a  Input  5) 

s  Ignal  b  1  nput  6 ) 

signal  c  output  7) 

phla  2) 

phlb  3) 

phlc  4) 

power  8 ) ) 

1 

1 

( s Ignal -output  c)  (nor  ((primitive  (gate  4))))) 

(gate  4)  (nor  ((primitive  (gate  3)>  (primitive  (gate  2))))) 

(gate  3) 

(  nor 

((primitive  (gate  1))  (primitive  (gate  0))     (primitive  ( s  Ignal - 1  nput  a))) 
(gate  2)  (nor  ((primitive  (gate  1)>  (primitive  (gate  #)))>> 
(gate  1)  (nor  ((primitive  ( s 1 gna 1 -  1 nput  a>)))) 
(gate  0)  (nor  ((primitive  ( s 1 gna 1  -  1 nput  b)))))) 

4  (phlc) ) 
3  (phlb) ) 
2  (phla)  ) 

1  ( ground ) ) 
8  ( power ) ) 

5  (Input  b  ( s  Ignal -Input  b))) 
5  (Input  a  ( s  Igna 1  -  1 nput  a))) 

7  (outputs  c  ( s  Ignal -output  c))))) 

Figure  2.25  Casand. obj 
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The  AND  plane  i  n  an  NMOS  PLA  is  actually  comprised  o-f  NOR 
gates,  its  -function  is  logical  AND,  but  its  constituent 
circuits  are  NOR  gates.  The  NOR  structure  which  the  control 
path  uses  is  also  di-f-ferent  topol  ogi  cal  1  y  -from  that  used  in 
the  data  path. 

A   concise  review  o-f  the  data  path  and  control  path 
variable  types  illustrates  the  usage  di  -f  f  erences: 


DATA  TYPE 
BOOLEAN  (true, -false)  INTEGER  (word  valued) 

STORAGE 
ELEMENT    -flag  register 

NON- 
STORAGE 
ELEMENT   signal  ( i  nput  ,  i  nternal  )       port  (all  types,' 


All  storage  elements  are  implemented  as  master- 
slave  i 1 1 p— f  1  ops.  They  retain  their  value  until  a  new  value 
is  clocked  into  them.  The  -flags  are  one  bit  wide,  and  are 
two— state  devices,  either  true  or  -false.  The  registers  have 
a  capacity  o-f  the  data  path  width  as  declared  in  the 
initial  PROGRAM  statement  in  the  MacPitts  source  program 
written  by  the  chip  designer. 

Non-storage  elements  are  used  primarily  tor  data 
communication  within  a  clock  cycle,  where  clock  cycle  here 
is  taken  to  re-fer  to  the  command  interpreter  clock  cycle, 
and  not  one  o-f  the  three  o-f -f -chip  clock  phases  which  a 
MacPitts  design  requires.  The  determination  o-f  the  value  o-f 
these   non-storage  elements  is  germane  to  pipelined   digital 


machines.  When  used  in  any  application,  cars  must  be  taken 
so  that  their  value  is  the  one  necessary  -for  subsequent 
stages  of  logic.  A  thorough  understanding  of  the  counter- 
intuitive parallelism  inherent  in  MacPitts  (the  language)  is 
necessary  to  avoid  mistakes  here.  MacPitts  is  not  like  the 
standard  sequentially  executed  higher  level  languages.  There 
are  at  least  three  levels  o-f  implicit  parallelism  possible 
in  a  MacPitts  algorithm,  and  an  understanding  o-f  parallel 
operations  is  necessary  to  avoid  functional  errors.  This 
consideration  is  germane  to  MacPitts  programming,  and  will 
be  considered  in  detail  later  in  this  Chapter  and  in 
Chapters  III  and  IV. 

The  next-to-last  line  in  Figure  2.24  illustrates  a 
conditional.  The  (b  ...statement  is  a  checked  '(condition) 
argument  o-f  the  beginning  COND  (do  upon  condition) 
statement,  as  is  (a...  .  If  condition  a  is  false,  and 
condition  b  is  -false,  then  no  output  is  SETQ'd.  Intuition 
would  suggest  that  the  output  would  then  either  remain  at 
its  last  value  or  transition  to  tri  state,  neither  o-f  which 
is  correct.  The  output  is  pulled  low  by  the  Weinberger  &.rr*^ 
circuitry.  This.  is  evident  in  Figure  2.27  the  Weinberger 
array  from  casand.mac,  and  in  Figure  2.28,  the  logic  gate 
equivalent.  The  (cond  (t  ...  form  can  be  used  to  set  a 
desired  output,  but  it  is  usually  better  suited  as  a  default 
condi t i  onal . 


Figure  2.26  Casand.ci-f 
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Figure  2.27  Casand.ci-f  Weinberger  Array 
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MacPitts  does  not  view  this  algorithm  as  a  usual 
high  level  sequential  test,  however,  but  rather  as  a 
parallel  test  of  a  and  b.  The  non-intuitive  parallelism  of 
MacPitts  was  mentioned  in  the  previous  paragraph,  as  was  the 
similarity  o-f  the  MacPitts  COND  statement  to  the  Pascal  CASE 
statement.  Some  elaboration  will  serve  to  clarity  this 
necessary  concept.  MacPitts  evaluates  all  o-f  the  forms 
within  the  scope  o-f  a  COND  statement  in  parallel,  in  a 
mutually  exclusive  -fashion.  With  regard  to  mutual. 
exclusivity,  it  is  then  similar  to  the  CASE  statement;  each 
condition  under  the  scope  o-f  a  COND  can  be  modelled  as  a 
•f  1  ow— o-f  —control  switch,  either  turning  on  the  evaluation  o-f 
its    constituent    -forms   or   else   skipping    ov&r  their 

evaluation.  The  analogy  does  not  hold  -further  than  this, 
however,  because  MacPitts  evaluates  all  o-f  the  conditions 
under  a  COND  in  parallel.  The  object  code  created  from  a 
MacPitts  source  -file  illustrates  this  well.  An  example  is 


(cond 

(hot  (setq  -fan_pn  t>) 

(cold  (setq  -fan_on  -f  )  ) 

(ok  (setq  -fan_on  i )  )   ) 


Where  hot,  cold,  and  ok  a.r<=  Boolean  variables 
(signals  or  -flags),  -fan_on  is  in  this  case  a  Boolean  signal 
output  which  is  to  be  turned  on  (t)  or  o-f-f  ( -f  )  depending  on 
an  input  temperature  signal.  COND  -forces  parallel  evaluation 
o-f  these  three  conditions  under  its  scope,  hot,  cold,  or  ok. 


.1.  ■ 


The  last  parenthesis  in  this  -fragment  closes  the  beginning 
parenthesis  prior  to  the  COND ,  bringing  the  three  conditions 
under  its  scope.  Since  these  conditions  are  evaluated  in 
parallel,  a  better  code  -fragment  would  be 


(cond 

(hot  (setq  -fan_on  t)) 

(cold  (setq  fan_pn  f)) 

(t  (setq  -fan_on  -f )  )   ) 


where  the  last  line  indicates  TRUE,  i.e.,  it  is  always  true. 
Since  COND  evaluates  in  parallel  with  mutual  exclusion  based 
upon  order,  if  either  o-f  the  -first  two  conditions  is  true, 
then  the  remaining  conditions  are  not  evaluated.  It  neither 
o-f  the  -first  two  conditions  is  true,  however,  then  the  tan 
will  be  turned  off.  This  code  fragment  permits  one  less 
signal  input  (or  one  less  f 1 ag  used)  on  the  chip,  and  use  at 
the  TRUE  t  condition  should  always  be  considered.  Its  use  is 
not  necessary,  as  indicated  by  the  tirst  code  -fragment. 

MacF'itts  produces  an  accompanying  object  code  which 
structurally  resembles  the  following  fragment 


(if 

(par (hot  ...  ) 

(cold. . .  ) 

(t    ...  )   )    ) 


where  the  COND  translates  to  an  IF,  and  the  parallelism  of 
MacPitts  is  evident  in  the  PAR  (parallelize)  embracing  the 
three  constituent  conditions  under  the  COND.  Parentheses  are 
as   important  in  MacPitts  as  thev  are    in  LISP.    In  the   last 


line  above,  there  are  three  closing  parentheses.  The 
innermost  closes  the  TRUE  condition,  the  middle  parenthesis 
closes  the  PAR  (paral  1  i  sat  i  on  of  condition  checking),  and 
the  outermost  closes  the  IF  (cond)  statement. 

The  LISP  object  -file  of  casand.mac  in  Figure  2.25 
indicates  the  LISP  equivalent  of  the  MacPitts  (language) 
algorithm,  and  shows  how  LISP  views  the  NOR  gate  inputs  as 
primitives.  MacPitts  is  also  able  to  compile  a  chip  layout 
directly  from  a  LISP  object  code.  This  is  an  option  for  the 
designer  who  is  fluent  in  LISP  in  that  customising  of  the 
code  and  hence  the  chip's  structure  is  possible.  RVLSI-3 
CRef.  6:p.  4]  describes  how  to  create  a  chip  design  from  an 
existing  LISP  object  file. 

Figure  2.26  shows  the  chip  resulting  trom 
casand.obj.  The  pads  are  all  placed  clockwise  around  the 
periphery  of  the  chip  in  the  order  specified  in  the  , mac 
file  (Figure  2.24).  This  built-in  function  of  MacPitts 
lends  itself  to  both  errors  and  possibilities  of 
improvement.  It  is  easy  to  identify  pad  function  if  the 
MacPitts  algorithmic  source  file  (written  bv  the  designer) 
is  at  hand. 

Figure  2.27  also  shows  the  topological  difference 
between  the  data  path  and  the  control  path.  In  previous  data 
path  circuits,  all  combinational  logic  was  implemented  with 
recognizable  NMOS  logic  gates.  In  the  data  path,  the 
Weinberger  array    is  made  up  of  many  vertical  metal   columns 


with   perpendicular  polysilicon  lines  cutting   across   them. 
Figure  2.28  illustrates  the  structure  more  clearly. 
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Figure  2.28  Gate  Equivalent  o-f  Casand  Weinberger  Array 

In  Figure  2.26  the  Vdd  input  rail  did  not  connect 
with  the  main  Vdd  bus  (it  has  been  corrected  in  Figure 
2.26).  It  passes  through  the  polysilicon  vias  and  stops 
abruptly.  The  reason  -for  this  is  the  expectation  o-f  minimum 
chip  size  which  MacPitts  harbors.  For  any  but  the  simplest 
o-f  chips,  the  Vdd  comb  will  extend  out  to  the  input  Vdd 
rail.  I-f  it  does  not,  the  Vdd  pad  can  be  placed  almost  at 
will  by  modi-fying  its  position  in  the  basename.mac  -file. 
RVLSI-3  discusses  this  CRe-f.  6:pp.  11-13D.  The  designer 
can  exercise  a  -fair  amount  o-f  latitude  in  pad  placement,  and 
MacPitts  will  accommodate  most  o-f  the  time.  The  suggestion 
in  RVLSI-3  that  GND  be  placed  near  the  beginning  and  Vdd  be 
placed  near  the  end  is  a  good  one.  The  main  problem  here 
would  arise  i-f  GND  were  placed  on  the  right  side  o-f  the  chip 
so  that  it  contacted  the  Vdd  comb  (which  it  will  do  i-f  car^ 
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is  not  exercised).  MacPitts  places  pads  exactly  in  the  order 
specified,  and  does  no  pad  functional  error-checking. 
Similarly,  i  f  a  pad  is  dual  -def  i  ned  ,  MacPitts  permits  it 
with  no  diagnostics.  This  extends  to  the  same  pad  being  used 
for  both  Vdd  and  an  input  signal.  So  carB  is  important  in 
both  pad  specification  and  positioning. 

There  exists  the  possibility  for  some  improvement 
in  chip  speed  by  designer  intervention  in  specifying  the  pad 
location.  By  moving  pads  five,  six,  and  seven  (input  and 
output  signals  in  casand.mac)  closer  to  the  Weinberger- 
array,  the  metal  run  lengths  can  be  reduced  and  thus  the 
metal  to  substrate  capacitance.  This  results  in  a  somewhat 
faster  chip,  all  other  factors  being  equal. 

Figure  2.27  is  a  blowup  of  the  Weinberger  array 
generated  by  casand.mac,  and  Figure  2.28  is  its  logic  gate 
equivalent.  The  Weinberger  s.rr^y  is  a  versatile  PLA— 1  i  ke 
structure  generally  used  to  implement  sequential  logic.  In 
this  chip,  as  an  unc locked  circuit,  it  implements  a 
combinational     function.      Weinberger      a.rra:y  gate 

instantiation  errors  were  first  detected  here  (circled). 
Note  the  two  half  lambda  gaps  in  the  NOR  gate  inputs.  By 
Caesar  editing,  unexpanding  of  affected  cells,  and  grep- 
searching  the  .cif  files  it  was  discovered  that,  these  errors 
occur  whenever  certain  NOR  gate  inputs  s.re  invoked.  The 
errors  themselves    were   suspected   to    reside   in    the 

control. lisp  file  of  the  MacPitts  source  code.   Two  specific 


cells  appear  to  generate  these  errors:  partial-gate-input 
('ground-right),  and  parti al -gate-i nput  ('ground  left).  Each 
is  one-half  lambda  too  short.  Chapter  VI  will  treat  the 
solution  o-f  this  problem.  The  MacPitts  command  interpreter 
does  not  detect  this  type  o-f  error,  since  it  only  exercises 
the  algorithm.  Lyra  or  a  similar  design  rule  checker  will 
detect  this  error.  The  designer  would  do  well  to  visually 
note  MacPitts'  inherent  errors  and  correct  them  prior  to 
submission  to  a  design  rule  checker  (drc). 

3-    A  Control  Path  OR  Gate 

Figure  2.29  illustrates  the  MacPitts  algorithm  to 
create  a  two  input  OR  realization  in  the  control  path.  The 
OR  -function  is  realized  by  a  selective  SETQ  choosing 
process,  in  a  similar  fashion  to  the  previous  AMD 
real i  zat i  on . 

Figure  2. -30  is  the  Weinberger  array  logical  unit  of 
casor.cif.  The  inputs  are  brought  in  on  either  side,  and  the 
output  comes  out  -from  the  middle  of  the  structure.  The  same 
instantiation  errors  as  in  the  previous  chip  were  generated. 
Part i al -gate— i nput  (gnd  left)  is  depicted  in  the  upper  left 
of  the  stipple  plot,  and  part i al -gate-i nput  (gnd  right)  is 
depicted  in  the  lower  right  of  the  plot  (circled)  in  Figure 
2.  30. 

The  logical  operation  of  the  Weinberger  array  could 
stand  some  clarification.  Figure  2.31  depicts  a  gate-level 
•functional  representation  of  Figure  2.30,  the  control  path 


; CASOR . MAC 

jSOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<or>   FUNCTION  BY  MACPITTS  SILICON  COMPILER//r>   Input  gate// 

(program  casor  1 

(def  1  ground) 

(def  a  signal  Input  5) 
(def  b  signal  Input  6) 
(def  c  signal  output  7) 

(def  2  phla) 
(def  3  phlb) 
(def  4  phlc) 
(def  8  power) 

i 
(a Iways 
( cond   (a 

(setq   c   t)  ) 
(b 

(setq   c   t)  >   )      )         )  ) 


Figure  2.29  Casor. mac 


implementation  o-f  a  two  input  COND-test  OR  structure. 
Looking  at  Figures  2.30  and  2.31  and  2.28,  the  -function  will 
be  explained.  Figure  2.30  has  -four  depletion  mode 
transistors  (control  columns  to  MacPitts).  The  left  most 
transistor  is  the  -first  inverter  in  Figure  2.31.  The  next 
column  in  Figure  2.30  serves  as  the  top  NOR  gate  in  Figure 
2.31.  Moving  right  in  Figure  2.30,  the  next  column  is  the 
output  inverter.  And  the  rightmost  column  is  the  lower  NOR 
gate  corresponding  to  Figure  2.31.  When  viewed  as  a  gate 
level  equivalent,  it  can  be  seen  that  the  Weinberger  arre^ry 
is  both  larger  and  slower  than  its  data  path  equivalent  (c-f. 
Figure  2.6).  In  the  control  path,  the  signal  requires 
approximately  -four  gate  delays  to  propagate  -from  input  to 
output.  This  slowness  has  been  somewhat  mitigated  by 
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Figure    2.30    Casor    Weinberger    Array 
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Figure  2.31  Gate  Equivalent  o-f  Casor  Logic 


the  large  aspect  ratio  o-f  the  pupllup  transistors  (bottom, 
Figure  2.30).  The  comparable  logic  gate  in  the  data  path 
only  requires  approximately  two  gate  delays,  one  -for  the  NOR 
gate  and  one  -for  its  subsequent  inverter  (Figure  2.7). 

This  simple  COND-driven  control  path  OR  gate  serves 
as  an  indication  o-f  how  MacPitts  constructs  similar  yet  more 
complicated  Weinberger  Array  structures.  The  decision  logic 
is  quite  unlike  that  o-f  a  PLA.  In  a  standard  NMOS  AND  plane- 
0R  plane  PLA,  a  signal  may  experience  at  most  -four  gate 
delays  (considering  input  and  output  inverters  both  active, 
and  pass  transistors  inducing  a  very  small  time  delay  ).  For 
this  simple  OR  circuit,  a  gate  delay  o-f  approximately  -four 
is  realized.  The  cascading  o-f  NOR  and  inverters  induces  even 
more  delay  -for  more  complicated  Weinberger  Arr^y    circuitry. 


4.    A  Four  Input  OR  Gate  In  The  Control  Path 

A     quad-input     OR    structure    is     specified 

algori thmical ly   in   Figure   2.32.   The  OR   logic   which   is 

implicit  in  MacPitts  specifications  is  perhaps  clearer   here 

than   in   the   two  input  OR  structure.   The   COND   statement 

■forces   a   Boolean   test  o-f   each   input,   and   selects   the 

appropriate  output.  To  reiterate,  the  COND  statement  and  its 

attendant   -forms  can  be  viewed  as  the  i  -f-then-el  se  construct 

o-f   many   higher  level  languages.   The   difference   is   that 

MacPitts  tests  the  condition  -forms  in  parallel,  and  not  in  a 

serial   -fashion   as   most  higher   level   so-ftware   compilers 

would.   The   mutual   exclusivness   o-f   the   <conditions>   is 

determined  by  serial  order,  however,  even  though  the  testing 

o-f   the   conditionals   is   done  in  one  clock   cycle   (or   in 

paral 1  el ) . 

■'  ;QUADOR.MAC 

':-  ;SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<or>   FUNCTION  BY  MACPITTS  SILICON  COMPILER//*   Input  gate// 

(program  quador  1 

(def  1  ground) 

(def  a  signal  Input  5) 

(def  b  signal  Input  6) 

(def  c  signal  Input  7) 

(def  d  signal  Input  8) 

(def  e  signal  output  9) 

(def  2  phla) 

(def  3  phtb) 

(def  A    phlc)  » 

(def  10   power) 

(a lways 
( cond   (a 

(setq   e   t)  ) 
(b 

(setq   e   t)  ) 
(c 

(setq   e   t)  ) 
(d 

(setq   e   t)  ) 

)       )  )  ) 

Figure  2.32  Quador. mac 


This  is  reflected  in  in  the  resulting  structures. 
Figure  2.33  shows  the  labelled  Weinberger  array  resulting 
■from  guador.mac,  and  Figure  2.34  is  its  logic  gate 
equivalent.  A  strength  o-f  MacPitts  is  that  it  forces  the 
designer  to  consider  both  behavior  and  structure  while  in 
the  process  of  writing  the  driver  algorithm.  This  is 
considered  to  be  advantageous,  inasmuch  as  the  abstractness 
factor  is  minimal.  There  are  two  broad  categories  of 
silicon  compilers,  behavior  oriented  (e.g.,  MacPitts),  and 
structure  oriented  (e.g.,  Bristle  Blocks).  In  Bristle  Blocks 
and  most  other  register  transfer  logic  (RTL)  silicon 
compilers,  a  structure  is  the  fundamental  building  block. 
The  structures  (register,  adder,  ALU,  gate)  must  be 
connected  appropriately  to  implement  the  desired  behavior. 
In  MacPitts,  the  desired  behavior  of  the  chip  is  the  input 
to  the  silicon  compiler  and  the  chip  which  implements  this 
behavior  is  the  output.  The  experienced  designer  is  aware  of 
the  structure  that  results  from  a  given  behavioral 
specification,  and  has  the  latitude  to  optimize  the 
algorithm  accordingly.  This  has  been  mentioned  previously, 
regarding  pad  placement  and  COND.  Optimization  will  be 
treated  further  later  in  this  thesis. 

5.    A  Four  Input  AND  Gate  In  The  Control  Path 

Figure  2.35  shows  the  algorithm  to  create  a  four 
input  AND  gate  in  the  control  path,  and  Figure  2.36  shows 
the  Weinberger  array    from  the  logic  block  of  quadand.cif. 
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Figure  2.33  Quador  Weinberger  Array 
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Figure   2.34  Gate  Equivalent  of  Quador  Logic 


Note  the  errors  generated  in  this  simple  -four  input,  one 
output  circuit  (circled,  Figure  2.36).  There  are  seven  gate 
gap  errors  (all  parti al -gate-inputs) ,  and  three  alignment 
errors.  The  alignment  errors  are  actually  derived  -from  mis- 
translation o-f  the  Weinberger  array  inter-face  cell  by 
MacPitts  (the  program).  The  inter-face  cell  is  created  with 
the  proper  pitch,  set  aside  in  the  VAX  11/780's  memory,  then 
invoked  and  its  image  translated  to  the  proper  position  in 
the  upper-le-ft  o-f  the  Weinberger  array.  By  convention,  upper 
le-ft  on  the  MacPitts  chips  re-fers  to  the  nominal  position  o-f 
the    GND    pad,    position    one. 


;QUADAND.MAC 

;SOURCE    CODE    FOR    ALGORITHMIC    CREATION    OF    LOGICAL 

;<and>      FUNCTION    BY    MACPITTS    SILICON    C0MPILER//4       Input    gate// 

(program    quadand     1 

(def    1    ground) 

(def  a  signal  Input  5) 

(def  b  signal  Input  6) 

(def  c  signal  Input  7) 

(def  d  signal  Input  8) 

(def  e  signal  output  9) 

(def  2  phla) 

(def  3  phlb) 

(def  4  phlc)  , 

( def  10  power ) 


( a  I  ways 
(cond   (a 

(b 


(c 


(d 


(t 


(setq  e  (and  abed) 

( setq  e  (and  abed) 

(setq  e  (and    abed) 
I 

(setq  e  (and    abed) 

(setq  e  f 


Figure  2.35  Quadand. mac 
So   what   appears   to  be  three   separate   alignment 
errors   is  actually  just  one  cell   translation   error.   This 
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error  should  be  repairable  in  the  macro-instantiation 
portion  o-f  MacPitts,  although  -further  investigation  will 
consider  also  the  possibility  o-f  an  error  in  program 
i  nstal 1 ment . 

6.    A  1J5  Input  OR  Gate  In  The  Control  Path 

It  was  stated  previously  that  MacPitts  will  permit 
no  more  than  -five  deep  cascading  o-f  the  same  gate  organelle 
in  the  data  path.  This  is  not  the  case,  however,  in  the 
control  path.  Figure  2.37  shows  a  MacPitts  algorithm  to 
create  a  16  input  OR  circuit.  Note  again  how  natural  the 
specification  is,  and  the  intuition  it  gives  into  both 
behavior  and  structure.  To  reiterate:  in  the  data  path,  one 
specifies  structure  explicitly  and  the  implicit  behavior 
results.  In  the  control  path,  one  specifies  behavior 
explicitly,  and  the  implied  structure  (always  a  Weinberger 
array)  results  (c-f.  Figure  2.13,  data  path  AND,  Figure  2.25, 
control  path  AND).  The  suggestion  is  to  specify  as  much 
combinational  logic  as  possible  in  the  control  path  (this 
decision  fortunately  never  arises  because  MacPitts  is  not 
primarily  a  combinational  logic  design  tool). 

In  program  multior.mac  the  data  path  width  is  still 
one.  The  data  path  width  actually  refers  to  the  number  of 
outputs  from  the  chip  (in  the  absence  of  a  data  path) ,  not 
as  its  name  would  lead  one  to  believe.  So  with  one  output, 
the  data  path  width  is  one,  even  though  there  are     16  inputs. 


Figure  2.36  Quadand  Weinberger  Array 
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The  -format  -for  data  path  width  specification  is 

(program      (program    name)      <data    path     width;- 

Figure  2.38  shows  the  chip  structure  o-f  multior.cif .  It  is 
seen  that  the  chip  is  composed  of  a  small  un-cl ocked  control 
path  unit  alone,  in  the  middle  o-f  the  Weinberger  Vdd/GND 
comb.  There  ar&  no  data  path  organelles.  As  previous 
experience  would  suggest,  this  control  path  has  several 
instantiation  gap  errors  and  cell  translation  errors  (see 
Figure  2.25).  The  large  number  of  depletion  pu.Ilup 
transistors  inherent  to  the  Weinberger  array  is  also 
apparent.  Combinational  logic  implementation  in  the  control. 
path  typically  requires  more  depletion  pullups  than  would  be 
required  for  the  equivalent  structure  in  the  data  path, 
because  all  control  path  logic  is  done  with  NOR  gates.  Since 
the  pullups  arB  always  turned  on,  a  MacPitts  chip  is  not 
expected  to  be  very  conservative  of  power.  In  the  four  input 
OR  gate,  there  were  eight  pullups  in  the  Weinberger  array, 
and  seven  instantiation  gap  errors.  In  the  16  input.  OR 
circuit,  there  ar&  30  pull  up  transistors,  and  approximately 
40  gap  errors.  These  errors  arsf  caused  by  instantiation  ot 
the  part i al -gate-i nput  cells  (specifically,  parti al -gate- 
i nput-ground-1 ef t  and  part l al -gate-i nput-ground-r i qht ) ,  and 
they  occur  every  time  one  of  these  cells  is  called. 


;MULTIOR.MAC 

;SOURCE  CODE  FOR 
;<or>   FUNCTION 
(program  multlor 

(def  1  g 

(def  2  p 

(def  a  s 

(def  b  s 

(def  c  s 

(def  d  s 

(def  e  s 

(def  f  s 

(def  g  s 

(def  h  s 

(def  1  s 

(def  J  s 

(def  k  s 

(def  1  s 

(def  m  s 

(def  n  s 

(def  o  3 

(def  p  s 

(def  q  s 

(def  22 


( a  1  ways 
(cond   (; 

(I 

( 

(. 

d 

( 

(< 

(I 

( 

{ 

( 

( 

( 

( 

( 

( 


ALGORITHMIC  CREATION  OF  LOGICAL 
BY  MACPITTS  SILICON  C0MPILER//I6 

1 
round ) 

hlaXdef  2  phlbXdef  3  phlc) 
Ignal  Input  S) 
Ignal  Input  S) 
f gna 1  1 nput  7  > 
Ignal  Input  8) 
Ignal  Input  9) 
Ignal  Input  10) 
I  gna 1  1  nput  11) 
I  gna 1  Input  12) 
Ignal  Input  13 ) 
1  gna 1  1 nput  14) 
Igna 1  1 nput  15 ) 
1  gna 1  1  nput  16) 
1 gna 1  Input  17) 
Ignal  1 nput  1 8  ) 
Igna 1  1 nput  19) 
Ignal  1 nput  20) 
Ignal  output  21  ) 
power  ) 


Input  gate// 


(setq   q  t>  ) 


(  setq 


(  setq 
I 

(  setq 
i 

(  setq 

(  setq 
I 

( setq 
i 

(  setq 

(setq 

I 
(  setq 

(  setq 

(  setq 
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(  setq 
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(  setq 
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(setq 
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Figure    2.37    Multior.mac 
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MacPitts  is  limited  in  the  data  path  as  to  how  many 
combinational  logic  cascades  may  be  made.  Since  the  control 
path  is  designed  to  make  decisions,  the  combinational  logic 
cascading  constraint  is  absent  for  most  practical  chips. 
Nevertheless,  an  error  was  detected  in  the  multior.cif  tile, 
Figure  2.38.  From  multior.mac  in  Figure  2.37,  one  would 
expect  the  chip  to  have  22  pads,  16  input  pads,  one  output 
pad,  three  clock  pads,  one  ground  pad,  and  one  Vdd  pad.  The 
cifplot  only  shows  21  pads.  This  error  does  not  show  in  the 
command  interpreter.  The  16  input  OR  function  works  as 
expected  there.  The  error  apparently  lies  elsewhere  than  in 
the  .mac  -file.  The  chip  does  function  nevertheless,  but  as  a 
15  input  OR  gate  instead  of  as  a  16  input  OR  gate.  The  pad 
deletion  error  (one  fewer  pads  instantiated  than  specified 
i"n  the  .mac  file)  occurs  whenever  an  OR  gate  having  more 
than  five  inputs  is  specified  in  the  .mac  file.  This  is  an 
unexpected  error,  though  not  very  serious.  The  control  path 
is  rarely  called  on  to  do  this  sort  of  logic.  If  a  special 
function  of  this  type  is  required  of  a  MacPitts  chip,  the 
designer  can  circumvent  this  problem  by  specifying  an  extra 
input  pad  in  the  .mac  file.  The  chip  will  compile  to  cif, 
but  the  extra  pad  will  not  be  instantiated  nor  will  any  of 
the  attendant  combinational  logic  or  wires. 
7 ■    Control  Path  Semantics 

The   syntax    (algorithm   rules)   for   combinational 
logic   in   the   control  path  has   been   illustrated   in   the 


previous  sections.  To  gain  an  understanding  of  MacPitts,  the 
semantics  (what  the  algorithm  means)  is  more  important  than 
how  to  say  what  it  means. 

The  parallelism  possible  in  MacPitts  has  been 
previously  referred  to  in  the  discussion  of  parallel  testing 
o-f  conditions  under  a  COND  statement.  This  is  not  the  only 
place  where  MacPitts  forces  parallelism.  Parallelism  is  also 
•forced  upon  all  < act i oris >  within  a  true  condition  under  a 
COND. The  general  form  of  a  COND  statement  is 

(cond  (  (condition)   <actions>   '(transition)   )) 

The  (condition)  is  a  Boolean  variable  upon  which  the 
true/false  test  is  made,  the  (actions)  ^r&  SETQs,  and  the 
(transition)  is  one  of  GO,  CALL,  or  RETURN  (to  be  discussed 
in  Chapter  IV) . In  the  previous  example,  both  hot  and  cold 
were  Boolean  conditional  variables  which  would  be  tested  in 
parallel.  The  '(actions)  under  the  CONE)  refer  to  a  set  o+ 
SETQ  assignment  operators,  and  the  SETQ '  s  under  a  COND  a.ns 
all  done  in  parallel,  or  simultaneously-  The  (transition 
form  indicates  a  state  transition  to  be  made  if  < condition  ■ 
is  evaluated  as  true.  This  state  transition  occurs  in 
parallel  (same  clock  cycle)  with  the  < actions)  associated 
SETQ's.  The  state  transition  mechanism  of  MacPitts  is  very 
straightforward   and   natural  to  a   designer   familiar   with 


Mealy   type   -finite   state   machines.   This   topic   will   be 
considered  in  depth  in  Chapters  IV  and  V. 

Note  the  difference  between  the  parallelism  implied 
within  the  COND  and  that  parallelism  implied  in  condition 
evaluation.  The  conditions  are  all  examined  in  parallel,  and 
for  the  -first  one  that  evaluates  to  logical  TRUE,  all  forms 
within  its  scope  are  executed  in  parallel.  This  high  degree 
of  implicit  parallelism  makes  MacF'itts  ideally  suited  for 
pipelined  architectures.  Consider  the  following  code  in 
which  three  Boolean  conditionals  determine  the  outputs.  The 
destinations  of  the  SETQs  are  also  Boolean,  and  in  this  case 
are  non-storage  elements  (signals).  The  outputs  a.re  declared 
signals  instead  of  flags  (which  are  storage  devices)  so  that 
when  they  are  not  set  within  a  clock  cycle  they  will 
transition  to  false. 


(cond 

(hot 

(setq  fan_on  t) 

(setq  wi  ndows__open  t) 

(setq  doors-open  t) 

(setq  heater_on  f ) ) 

(col  d 

(setq  f  an  __on  f  ) 

(setq  windows_open  f) 

(setq  doors_open  f) 

(setq  heater_on  t)> 


(t 


(setq  windows_open  t) 
(setq  doors_open    t) )    ) 


This     algorithm   models   a   simple   digital   home 
temperature   controller   where  f  refers  to   an   inactive   or 


closed  device,  t  re-fers  to  an  active  or  open  device.  and  a 
comf  ortabl  e  temperature  deadband  exists  between  heating  and 
cooling  requirements.  All  three  Boolean  conditions  (hot, 
cold,  and  true)  are  tested  in  parallel.  The  order  of  mutual 
exclusion  is  the  order  in  which  the  conditions  are  written 
(if  both  cold  and  t  are  true  simultaneously,  only  the 
actions  under  cold  will  be  executed).  The  conditional  (t... 
is  the  MacPitts  equivalent  o-f  a  reserved  word,  and  indicates 
the  always  true  conditional.  It  is  used  in  this  algorithm  as 
the  default  state  of  the  system,  where  the  temperature  is 
comfortable  enough  to  leave  both  the  doors  and  windows  open. 
Even  though  (t...  is  always  true,  the  evaluation  order  of 
the  conditionals  prevents  the  forms  under  its  scope  from 
being  set  unless  both  the  preceeding  conditionals  are  false. 
The  actions  under  each  true  condition  are  also  performed  in 
parallel,  or  in  the  same  clock  cycle.  So  the  testing  of  ail. 
three  conditions  and  the  resultant  SETQ  -(actions)  occur  in 
only  one  clock  cycle,  due  to  the  implicit  parallelism  of 
MacPitts.  It  is  not  necessary  for  the  MacPitts  programmer  to 
explicitly  parallelise  the  forms  under  a  COND ,  the  MacPitts 
compiler  does  this  every  time  it  encounters  a  COND.  The? 
(setq  < output >  f)  statements  under  the  hot  and  cold  CONi  3 
are  not  required  for  this  system.  As  explained  previously, 
the  Weinberger  array  will  set  the  output  false  if  it  is  not 
explicitly  driven  true  for  non— storage  Boolean  variables. 
The  (setq  <output>  f)  statements  have  the  advantage  of  added 


clarity  in  the  MacPitts  driver  algorithm  at  the  expense  o-f 
increased  size  o-f  the  Weinberger  array  (more  decisions  ars 
required ) . 

The   -Following   code   -Fragment   produces   the   same 
results,  though  is  somewhat  more  obscure: 


(al ways 
(par 

(setq  ■fan_on  hot) 

(setq  heater_on  cold) 

(setq  windows_open  (not  cold)) 

(setq  doors_open  (not  cold)))) 


In  this  example,  no  conditional  testing  is 
necessary  although  the  results  are  equivalent  to  the 
previous  example.  On  every  clock  cycle,  all  o-F  the  -Forms 
embraced  by  PAR  ar<^  executed.  On  each  clock  cycle,  the  -fan, 
heater,  windows,  and  doors  are  set  to  the  correct  state.  The 
resulting  hardware  is  simpler,  since  fewer  decisions  are 
required.  This  is  the  preferred  format  when  conditional 
testing  can  be  explicitly  done  with  Boolean  logic  in  the 
Weinberger  array.  But  this  code  fragment  lacks  the  ability 
to  branch.  When  transfer  of  control  is  required,  then  it  is 
necessary  to  use  the  full  generalized  COND  statement 

cond  (  (conditional  >  -(actions)  (transition)  ) 
form  instead  of  the  truncated  version 
cond  (  (conditional)  (actions)-  ) 


8.  Five    Input    AND    Gates    In    The    Control     Path 

The  savings  of  area,  in  the  Weinberger  array  can  be 
substantial  when  Boolean  decisions  are  made  without  a 
precedent  COND  statement-  Figure  2.39  shows  the  MacPitts 
code  used  to  generate  a  -five  input  AND  gate  using  COND  -for 
each  output,  and  Figure  2.40  shows  the  resulting  Weinberger 
array.  Figure  2.41  is  the  logic  gate  equivalent  o-f  the  -five 
input  COND  driven  AND  gate. Contrast  this  with  Figure  2.42 
illustrating    the   code    -for    generation 


;FIVEAND.MAC 

•.SOURCE    CODE    FOR    ALGORITHMIC    CREATION    OF    LOGICAL 

;<and>       FUNCTION    BY    MACPITTS    SILICON    C0MPILER//5       Input    gate// 

(program    flveand    1 

(def    1    ground) 

(def    a  signal     Input    5> 

(def    b  signal     Input    6) 

(def    c  signal     Input    7) 

(def    d  signal     Input    8) 

(def    e  signal     Input    9) 

(def    z  signal    output    \B) 

(def    2   phla) 

(def    3    phlb)  , 

(def    4    phlc) 

( def    1 1    power ) 


( a Iways 
(cond       (a 

(b 


(c 


(setq      z    (and    a    b    c    d    e) 
(setq      z    (and    a    b    c    d    e) 


(d 


(e 


(t 


(setq  z  (and  a  b  c  d  e) 
I 

(setq  z  (and  a  b  c  d  e) 
i 

(setq  z  (and    a    b    c    d    e) 

(setq  z  f 


Figure    2.39    Fiveand.mac 
o-f       a       -five    input    AND    gate    in    the    Weinberger       array       without 
CONDs,       Figure       2.43,  the    resulting    Weinberger    array    logic 

generated       by    MacPitts,       and    Figure    2.44,  the       logic       gate 
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Figure  2.41  Gate  Equivalent  o-f  Fiveand  Logic 
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equivalent  of  a  -five  input  AND  gate  without  CONDs.  The 
second  structure  is  far  simpler  topol ogical ly ,  having  only 
six  pullup  transistors.  The  Weinberger  array  which  achieves 
the  same  results  with  CONDs,  Figure  2.39,  requires  twelve 
pullups  by  comparison.  Since  fewer  explicit  decisions  need 
to  be  specified,  even  the  code  o-f  the  COND-less  chip  is  more 
terse  than  its  COND  decision  counterpart.  In  comparing  the 
logic  gate  circuit  equivalents,  the  -five  input  AND  gate 
created  with  CONDs  requires  six  inverters  and  six  NOR  gates, 
and  the  NOR  gates  have  -fan  — ins  o-f  -five,  six,  seven,  eight, 
and  nine.  There  are  -four  levels  to  this  structure.  The  -five 
input  AND  gate  created  without  CONDS  has  only  five  inverters 
and  one  NOR  gate  with  a  fan-in  of  five,  and  there  are  two 
levels  of  gates.  The  circuit  created  without  CONDs  is 
smaller,  simpler,  and  faster. 

;SIMPL5AND.MAC 

;SOURCE  CODE  FOR  ALGORITHMIC  CREATION  OF  LOGICAL 

;<and>   FUNCTION  BY  MACPITTS  SILICON  C0MPILER//5   Input  gate// 

(program  slmplSand  I 

(def  1  ground) 

(def  a  stgnal  tnput  5) 

(def  b  signal  Input  6) 

(def  c  signal  Input  7) 

(def  d  signal  Input  8) 

(def  e  signal  Input  9) 

(def  2  signal  output  10) 

(def  2  phla) 

(def  3  phlb)  » 

(def  4  phlc) 

( def  1 1  power ) 

( a  1  ways 

(setq   z  (and  a  b  c  d  e)   )  )   ) 


Figure  2.42  Si mpl 5and . mac 
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Figure    2.43    Weinberger    Array    -from    Simpl  Sand,  ci  -f 
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Figure  2.44  Gate  Equivalent  o-f  SimplSand  Logic 
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The  economics  of  using  CONDless  algorithms  does  not  always 
justify  their  use.  Silicon  compilation  is  intended  to  tree 
the  engineer  from  the  micro— design  aspects  of  creating  a 
chip,  and  Boolean  minimisation  (see  the  home  temperature 
controller  example)  is  a  step  away  -from  this  goal. 
Typically,  the  control  path  is  not  used  to  implement 
combinational  logic  -functions,  taut  rather  to  provide 
controlling  inputs  to  data  path  operations.  The  decision  to 
signal  on  five  simultaneous  TRUE  inputs  would  always  be  done 
as  shown  in  Figure  2.42,  and  not  as  in  Figure  2.39,  but  this 
decision  would  usually  have  a  COND  embracing  (around) 
itself.  The  COND  in  MacPitts  is  used  for  decision.  Attempts 
to  minimize  CONDs  will  lead  to  a  loss  of  clarity  in  the 
algorithm  (see  the  simplified  home  temperature  controller 
example).  Nevertheless,  if  the  Weinberger  array  becomes  too 
large  and  slow,  Boolean  reduction  techniques  such  as  Quine- 
McCluskey  or  Karnaugh  maps  should  be  considered. 
9-    A  Better  15  Input  Control  Path  OR  Gate 

A  remarkable  power  savings  in  the  Weinberger  a.rrs,x' 
can  be  expected  where  this  alternate  algorithm  (explicit 
specification  of  outputs  without  use  of  COND  testing)  is 
feasible.  Figure  2.45  depicts  another  method  of 
al gor i thmi cal 1 y  specifying  a  sixteen  input  logical  OR 
selector  in  the  control,  path  (compare  with  Figure  2.37) 
Figure  2.38  shows  the  resulting  layout  from  the  algorithm 
using   multiple  CONDs  for  selection,   and  Figure  2.46   shows 


the  Weinberger  array  layout  resulting  -from  the  algorithm 
using  just  Boolean  logic  specification.  Figure  2.47  shows 
the  logic  gate  equivalent  o-f  Figure  2.46. 


!    ;SMPLMLTR.MAC 

"-.SOURCE    CODE    FOR    ALGORITHMIC    CREATION    OF    LOGICAL 
;<or>      FUNCTION    BY    MACPITTS    SILICON    C0MPILER//1S       Input    gate// 
; a    simplified    structure    resulting    from   elimination    of    "cond" 
(program    smplmltr    1 

(def    1    ground) 


(def 

a 

s 

Igna 

Input 

5) 

(def 

b 

s 

Igna 

Input 

6) 

(def 

c 

3 

Igna 

Input 

7) 

(def 

d 

S 

Igna 

Input 

8) 

(def 

e 

3 

Igna 

Input 

9) 

(def 

f 

3 

Igna 

Input 

10) 

(def 

g 

S 

Igna 

Input 

11  ) 

(def 

h 

S 

Igna 

Input 

12) 

(def 

t 

3 

Igna 

Input 

13) 

(def 

J 

3 

Igna 

Input 

14) 

(def 

k 

S 

1  gna 

Input 

15) 

(def 

1 

S 

Igna 

1  nput 

16) 

(def 

m 

S 

Igna 

Input 

17) 

(def 

n 

S 

1  gna 

Input 

18) 

(def 

o 

3 

1  gna 

Input 

19) 

(def 

P 

3 

1  gna 

Input 

20) 

(def 

q 

3 

Igna 

1  output  21) 

(def    2    phla) 

(def    3    ph1b> 

(def    4    ph  fc) 

( def    22    power ) 

( a  1  ways 

(  setq    q    (or    a    b    c    d    e_f_9_h_l_  J    k    7  _m_n_o_p ) ) ) 


Figure    2.45    Smplmltr. mac 

Note  in  particular  the  di  -f  -f  erence  in  number  o-f  pullup 
transistors  between  the  two  circuits  (Figures  2.38  and 
2.46).  There       are       thirty    pullups    in    the    circuit       created 

using  COND  testing,  and  only  two  pullups  in  the  circuit 
created  -from  the  COND-less  algorithm.  The  pullup  transistors 
are  always       turned       on,  and       as       a       consequence       consume 

proportionally         more       power       than       transistors       which  are 

intermittently       turned    on.        So    a    circuit       power       consumption 
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Figure  2.46  Weinberger  Array  -from  Smpl  ml  tr .  ci -f 
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Figure  2.47  Gate  Equivalent  o-f  Smplmltr  Logic 
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savings  can  be  realized  by  appropriate  COND-less  decision 
specification,  where  appropriate.  But  note  that  this  is  not 
always  possible,  nor  is  the  COND-less  algorithm  always  as 
clearly  understood  as  the  algorithm  using  COND  -for  testing 
and  branching. 

These  logic  decisions  would  all  occur  electrically 
in     the    Weinberger    array  (equi val ent 1 y;      occurring 

algor i thmi cal 1 y  in  the  compiled  LISP  object  code),  since  the 
decision  stipulations  are  Boolean  and  not  integer.  The  forms 
•for  Boolean  combinational  logic  and  integer  (word; 
combinational  logic  are  syntactically  different,  and  it  is 
necessary  that  the  MacPitts  programmer  understand  this 
syntax  difference  in  addition  to  the  logical  implementation 
difference  described  previously. 

1Q-   Two  Con  si  derat  i  ons  In.  Mac  Pi  tts  Proqrammi  nq 

MacPitts  is  both  a  programming  language  and  a 
method  of  designing  digital  circuits.  As  such,  the 
programmer  must  consider  the  consequences  of  syntax  used  in 
the  driver  algorithm  (the  .mac  file).  It  is  not  always 
apparent  beforehand  whether  a  given  function  should  be  done 
in  the  control  path  or  in  the  data  path.  The  choice  is 
determined  by  the  syntax  used  by  the  designer. 

Suppose  a  four  input  AND  gate  is  to  be  designed  in 
both  the  data  path  (word  type)  and  in  the  control  path 
(Boolean  type) ,  where  a,  b,  c,  and  d  are  inputs  and  z  is  the 
output.   The   statement  which  relegates  the  decision  to   the 

■3  :\ 


data  path  is 

(setq  z  (word-and  a  (word— and  b  (word— and  c  d)))   ) 

where  a,  b,  c,  d,  and  z  must  all  be  either  ports  or 
registers  (integer  valued).  The  corresponding  statement  far 
the  control  path  is 

(setq  z  (and  a  b  c  d) ) 


which  requires  that  a,  b,  c,  d,  and  z  all  be  either  signals 
or  -flags  (Boolean  valued). 

In  complicated  architectures  and  most  sequential 
machines,  this  choice  does  not  have  to  be  made  a  priori,  but 
rather  will  be  made  by  syntax  in  writing  the  MacF'itts 
algorithm.  In  simpler  architectures,  like  a  Hamming  error 
detector  or  a  Grey  code  decoder,  this  decision  should  be 
made  beforehand.  The  choice  can  be  regarded  as  one  between 
individual  treatment  of  the  data  bits  (usually  done  in  the 
control  path  logic) ,  or  treating  the  data  as  n— bit  words 
(done  exclusively  in  the  data  path).  Examples  of  algorithms 
to  do  Grey  code  decoding  and  Hamming  error  detection  and 
correction  are    given  in  Chapters  IV  and  VI. 

The  MacF'itts  programmer  /desi  gner  must  also  consider 
the  hardware  ramifications  of  syntax.  The  algorithm  chosen 
to  implement  a  function  in  MacF'itts  drives  the  circuit 
implementation  to  achieve  that,  function. 


It  has  been  mentioned  previously  that  COND  -forces 
conditionals  to  be  tested  in  parallel,  and  their  antecedent 
actions  to  be  SETQ ' d  in  parallel.  This  equates  to  silicon 
area/speed  tradeoff  on  the  chip.  If  multiple  operations  of 
the  same  type  are  to  be  done  under  a  COND,  MacPitts  will 
instantiate  copies  o-f  the  required  organelle,  and  perform 
the  operations  in  parallel.  Conversely,  if  the  same 
operations  are  not  put  under  a  COND,  MacPitts  will 
instantiate  only  one  copy  of  the  organelle,  and  perform  the 
operations  serially.  For  instance,  there  are  two  ways  to 
perform  a  set  of  three  data  path  logical  two-bit  ANDs  on  six 
inputs.  The  first  method  does  the  operations  in  parallel,  at 
the  cost  of  silicon  area. 


(cond  (t 

(setq  x  (word-and  a  b) ) 

(setq  y  (word-and  c  d)) 

(setq  z  (word-and  e  f ) )  )    ) 


This  algorithm  fragment  would  execute  in  one  clock  cycle, 
but  MacPitts  would  implement  it  with  three  data  path  AND 
gate  organelles,  each  gate  having  two  inputs.  The  slower 
algorithm  would  be 


(setq  x  (word-and  a  b)> 
(setq  y  (word-and  c  d)) 
(setq  z  (word-and  e  f>) 


The   second   example   would  require  three   clock   cycles   to 
execute,   but   only   one   data  path  AND  organelle   would   be 


instantiated.  Similarly,  PAR  forces  all  -forms  within  its 
scope  to  be  executed  in  parallel.  The  best  way  to  verify 
this  is  to  create  a  short  FSM  algorithm,  and  manually  clock 
it  while  in  the  interpreter.  (This  is  also  an  excellent 
method  to  optimize  algorithms  for  throughput  by  paralleling 
operations  where  possible  and  testing  for  execution  in  the 
interpreter.  The  results  may  not  be  what  is  expected.) 

C.   SUMMARY 

This  chapter  discussed  the  differences  between  MacPitts' 
implementation  of  combinational  logic  in  the  control  path 
and  data  path.  The  fundamental  difference  is  one  of 
structure,  which  is  driven  by  syntax. 

When  the  data  type  is  defined  Boolean,  and  the  correct 
operations  are  applied  to  the  bits,  the  combinational  logic 
occurs  in  the  control  path.  Control  path  logic  is  always 
done  by  a  Weinberger  array^  an  array  of  NOR  gates.  When  the 
data  type  is  defined  as  integer,  and  the  correct  operations 
are  applied  to  the  words,  the  combinational  logic  occurs  in 
the  data  path.  The  fundamental  units  of  the  data  path  are 
two— input  organelles,  which  are  structural  mappings  of  the 
syntactical  statements  NOT,  AND,  NAND ,  OR,  NOR,  XOR, 
i ncrement /decrement ,  and  add/subtract.  The  data  path 
performs  the  arithmetic  functions  and  also  generates  signals 
to  control  for  decisions.  Combinational  logic  syntax  (and 
hence  structure)  in  the  data  path  obeys  the  fundamental  laws 
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of  Boolean  algebra,  such  as  associativity  and  commutati vi ty . 
The  designer  must  consider  these  laws  in  writing  the 
MacPitts  algorithm  i  f  correct  -function  is  desired. 

The  LISP-like  COND  -form  produces  parallelism  in 
MacPitts.  The  COND  form  is  a  statement  which  (structurally) 
implements  decisions  in  the  Weinberger  array  and 
(algor  i  thmi  cal  1  y )  drives  control  flow  in  both  the  .mac  -file 
and  the  .obj  -file.  Control  path  structures  may  be  reduced  in 
size  (where  possible)  by  not  using  the  COND  -form  to  specify 
output  conditional  setting.  The  alternative  is  the  PAR 
(parallelize)  form,  which  parallels  all  the  forms  under  its 
scope.  The  forms  embraced  by  PAR  must  be  the  functional 
equivalents  of  those  under  COND,  which  requires  designer 
intervention  and  possibly  Boolean  algebraic  reduction.  The 
result  of  this  alternative  is  unconditional  explicit 
assignment  of  outputs.  This  is  feasible  in  simpler  chips, 
and  should  always  be  considered  on  the  basis  of  an 
engineering  tradeoff  between  design  time  and  chip  speed. 

The  COND  statement,  with  multiple  selections  of 
conditionals,  can  be  viewed  as  an  implicit  AND-OR  structure 
realized   in   NORS  in  the  Weinberger  array.  An   alternate 

syntactical  viewpoint  of  COND  is  the  CASE  statement. 

The  gates  created  in  this  chapter  are  rather  artificial, 
in  that  they  were  made  to  show  just  the  structures  desired. 
In  practice,  the  combinational  logic  structures  used  are 
likely  to  differ  slightly. 
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Ill-   A  SPEED-POWER  COMPARISON  BETWEEN  A  DATA  PATH 
AND  CONTROL  PATH  EQUIVALENT  CIRCUIT 

A  behavior-oriented  silicon  compiler  requires  a  high 
level  algorithmic  description  of  the  chip's  desired  function 
as  its  input.  The  output  is  a  machine  readable  low  level 
geometric  description  o-f  the  resulting  digital  circuit, 
usually  CIF  (Caltech  Interchangeable  Format),  a  language 
describing  rectangles  -from  which  the  various  process  masks 
and  their  relative  locations  are  registered.  When  a  CIF  tile 
is  processed  by  Mosis  (Metal  Oxide  Silicon  Implementation 
Service),  the  desired  chip  results. 

Chapter  II  considered  the  qualitative  effects  of 
algorithmic  syntax  on  some  circuit  structures  in  the  data 
and  control  paths.  It  is  also  desired  to  do  a  quantitative 
investigation  on  functionally  equivalent  circuits  in  each 
path,  and  to  compare  the  results.  The  circuits  chosen  s.r& 
the  five  input  AND  gates  in  both  their  control  path  and 
data  path  configurations.  Handcrafted  versions  of  the  five 
input  AND  gate  are  contrasted  to  the  MacPitts  five  input  AND 
gates. 

A.   DATA  PATH  FIVE  INPUT  AND  BATE 

Figure  3.1  shows  the  algorithm  used  to  create  a  five 
input  AND  gate  in  the  data  path.  Figure  3.2  shows  the 
labelled  ci-fplot  of  the  four  cascaded  NAND  organelles  and 


-four  inverters,  and  Figure  3.3  is  the  logic  gate  equivalent 
of  the  cifplot.  The  LISP  object  -file  is  included  in  Appendix 
A    to    show    how    MacPitts    implements    the    data    path    AND    function 


;FIVANO.MAC,    data    path 
{ program    f  1  vand    1 

( def    1    ground ) 

(def    a    port     Input    (2)) 

(def    b    port     Input     ( 3 ) > 

(def    c    port     Input     (4>) 

(def    d    port     1 nput     ( 5 ) ) 

(def    e    port     Input    (S)) 

(def    z    port    output     (7)) 

(def    8    phia) 

(def    9    phlb ) 

(def    10   phlc) 

( def    1 1    power ) 

( a  1 ways 

( setq    z 
(word-and    a(word-and    b(word-and    c(word-and    d    e)>)))> 


Figure  3.1  Data  Path  Five  Input  AND  Gate  .mac  File 

by  invoking  the  organelle  AND  -four  times.  As  discussed  in 
Chapter  II,  the  MacPitts  algorithm  produces  the  LISP  object 
■file,  from  which  MacPitts  (the  silicon  compiler)  produces 
the  layout.  At  run  time,  the  MacPitts  (silicon  compiler) 
script  file  shown  in  Appendix  A  is  created.  The  best  way  to 
create  a  script  file  of  a  MacPitts  terminal  session  is  to 
issue  the  command 

macpitts  basename  herald  >  basename. script  & 

where  the  option  herald  directs  MacPitts  to  send  compiler 
messages  (see  compmesg.*  files  in  MacPitts  source  code)  to 
the  designated  output  device,   ">"  is  the  BSD  Unix  redirect, 
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Figure  3.2  Stipple  Plot  of  Data  Path  Five  Input  AND  Gate 
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basename.  script  is  the  -file  into  which  the  terminal  session 
is  to  be  recorded,  and  "&"  is  the  Unix  command  to  put  a 
process  into  the  background.  If  the  algorithm  is  not  -fully 
debugged,  then  issue  instead 

macpitts  basename  herald 

so  MacPitts  diagnostics  and  Liszt  diagnostics  both  will  come 
to  the  screen,  and  no  hardcopy  recording  will  occur.  It  is 
possible  to  both  monitor  and  simultaneously  record  the 
MacPitts  compilation,  by  issuing  the  command 


r> 


r> 


r 


[>>CH>i 


<- 


Figure  3.3  Gate  Equivalent  o-f  Figure  3.2 

script  basename. script     (starts  script  recording) 
to  which  Unix  will  respond  with 

"script  started,  filename  is  basename. scri pt " 
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Figure  3.4  Stipple  Plot  Showing  Critical  Nodes 
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then  issue  the  full  path  command  (a  Unix  bug  requires  this) 

/vl si /macpi t /bi n/macpi tts  basename  herald 

and  when  compilation  is  done  type  control  d  to  terminate  the 
script  recording.  The  script  capability  is  useful  tor 
•following  the  MacPitts  compilation  process,  gives  insight 
into  how  MacPitts  works,  and  assists  in  debugging  the  driver 
algorithm.  Tracing  of  MacPitts'  compilation  of  an  algorithm 
can  then  be  done  with  a  grep  search  on  the  compmesg.*  tiles 
tor  the  statistics  and  the  h 1 . 1 i sp  files  for  the  herald 
messages.  If  the  algorithm  halts  execution,  the  script  file 
indicates  where  in  the  compilation  process  the  error  was 
detected.  That  part  of  the  algorithm  can  then  be  checked  for 
errors. 

The  script  of  a  MacPitts  session  also  has  informative 
material  (statistics)  on  the  chip  size,  components,  maximum 
power  used,  and  host  computer  effort  expended  to  compile  the 
chip.  Carlson  [Ret.  2: p. 43]  describes  the  script  file 
produced  by  a  MacPitts  compilation  session. 

After  the  basename. cif  file  is  produced  by  MacPitts,  it 
is  necessary  to  comment  out  the  beginning  user  extension 
zero  lines  with  the  vi  screen  editor.  This  is  done  by 
invoking  vi  on  the  cif  file 

vi  basename. ci f 


and  placing  parentheses  around  these  lines.  Carlson 


CRe-f.  2:  p.  70]  explains  why  this  is  necessary.  The  Caesar 
■File  must  next  be  created  so  labelling  of  nodes  can  be  done 
-For  Mextra  (Manhattan  Circuit  Extractor).  The  command  to 
convert  a  .cif  -File  to  a  .ca  -File  is 

cif2ca  -o  < off set >  basename.  ci -F 

where  the  o-F-Fset  is  a  number  added  to  the  Caesar  symbol xx.ca 
•Files  to  distinguish  them  -From  previously  created  symbol 
files  which  might  have  the  same  number  (:•::•:). 

The  procedure  described  above  results  in  a  MacPitts  end 
product,  the  basename.  ci  f  -File,  and  a  version  of  that  -File 
amenable  to  editing  in  the  VLSI  graphics  editor  Caesar,  the 
basename.  ca  -File.  For  quantitative  analysis  o-f  a  MacPitts 
design,  -further  steps  are    required. 

To   begin   this   analysis,   the  nodes  are       labelled   (in 
Caesar)  -For  Mextra  and  Crystal  (a  timing  analyser).   Work  by 
Froede   CRef  .   3s  pp  63— 80D  addresses  Crystal   analysis   of 
MacPitts  circuits.   After  the  input,   output,   GND ,   and  v"dd 
nodes  are    labelled,  the  following  commands  are    issued 

:  save 
and  then , 
: ci  f  -p 

in  Caesar  to  save  the  new  labelled  .ca  file  and  to  create  a 
.cif  file  with  nodes  at  points  (-p)  for  Mextra.  Figure  3.2 
is   the   point-labelled  cif  plot  of  the  data  path  five   input. 


AND  gate.  Next  Mextra  is  invoked  on  the  labelled  tile  by  the 
command 

mextra  -o  basename 

where  the  -o  switch  causes  more  accurate  capacitance 
calculation  (than  is  done  without  -o) .  Mextra  produces  the 
basename.  nodes  -file,  which  can  be  checked  for  connectivity 
and  to  see  that  all  labelled  nodes  ar&  included.  Appendix  A 
shows  the  .nodes  tile  for  the  data  path  AND  gate.  The 
basename. si m  file  is  also  produced,  and  can  be  used  for 
switch  level  simulation  with  Esim,  SPICE  simulation.  Crystal 
timing  analysis,  and  power  estimation  with  Powest .  The 
berk85  version  of  Crystal  is  the  more  useful  (compared  to 
the  berk83  Crystal)  version.  To  record  a  Crystal  session, 
start  the  script  recording,  and  then  call  Crystal  with  its 
full  path  designator 

. /vl si /ber k85/bi n/crystai   basename. si m 

Crystal  has  many  options  and  commands.  The  1985  version  of 
the  Crystal  manual  which  describes  them  is  available  on  the 
Naval  Postgraduate  School  VAX  11/780  in  the  file 

/vl si /ber k 85 /doc/ crystal /crystal  .  tbi ms 

Appendix  A  shows  the  script  recording  of  a  Crystal  analysis 
of  the  data  path  AND  gate.  After  the  input  and  output  nodes- 
are  assigned  and  the  delay  is  given,  the  command 


critical   -g   -f  i  1  en  ame.  dummy 
is  issued,  then  Crystal  is  stopped  with 

qui  t 

and  then  script  is  terminated  with  control  d.  The  critical 
command  determines  the  time-critical  (i.e.,  slowest)  signal 
path,  and  the  -g  (graphical  results)  switch  in  conjunction 
with  it  creates  a  Caesar-compatible  -file  o-f  the  critical 
node  locations  as  shown  in  Appendix  A.  This  -file  can  then  be 
added  to  the  basename.ca  -file  by  the  sequence  o+  commands 

caesar  basename         (Caesar  edit  labelled  tile) 

: source  filename       (add  critical  nodes  to  screen) 

Since  the  Crystal  nodes  displayed  in  Caesar  a.re  not 
reproduced  in  cit ,  the  nodes  must  be  edited  in  Caesar  it 
an  annotated  stipple  plot  is  desired.  One  technique  is  to 
erase  the  Crystal —sour ced  (created  by  the  :  source  command.; 
nodes,  and  replace  them  with  implant  layer  squares  (implant 
tor  visibility  and  contrast)  and  then  to  relabel  the  delay 
times  with  Caesar's  :  label  command.  The  revised  Cs.e<sar  tile 
can  then  be  saved  and  converted  to  cit  tor  stipple  plotting 
with  the  series  at  commands 

:  save 
and  then 

: ci f  -p 


Figure  3.4  shows  the  ci-fplot  of  the  circuit  with  the 
critical  nodes  marked.  The  critical  nodes  lie  along  what 
Crystal  considers  the  critical  (slowest)  path.  The  largest 
delay  shown  is  the  circuit  cumulative  delay,  and  each  marked 
node  indicates  a  cumulative  delay.  This  makes  it  simple  to 
determine  the  delay  between  critical  nodes  as  the  difference 
between  their  successive  cumulative  delays.  The  stipple  plot 
can  be  difficult  to  interpret  if  it  is  desired  to  determine 
what  structure  causes  the  delays.  A  gate  equivalent  of  the 
cifplot  can  be  helpful  in  the  analysis.  The  gate  level 
equivalent  of  this  circuit  with  marked  cumulative  delays  is 
shown  in  Figure  3.5.  The  data  path  AND  gate  spreads  the 
delay  out  evenly,  with  approximately  10  ns  per  gate,  as  is 
expected   from  the  transistor  aspect  ratios  shown  in   Figure 


The  maximum  power  consumed  by  the  circuit  can  be 
determined  in  either  of  two  ways.  The  MacPitts  script 
session  (of  the  compilation  process)  records  it,  or  Rawest 
(Power  ESTimator)  can  be  used  on  the  basename.sim  file 
produced  by  Mextra.  Powest  computes  the  power  based  on  only 
the  number  of  depletion  transistors,  assuming  that  they  a.r^ 
on  all  the  time  (for  the  maximuum  power  figure)  or  on  half 
the  time  (for  the  average  power  figure).  MacPitts  considers 
both  the  number  of  depletion  transistors  and  the  power 
consumed  by  the  circuit  wires,  so  the  MacPitts  power  should 
be   the  more  accurate  of  the  two.   The  command  to  use  Powest 


on  the  .sim  -file  is 

powest  -p  <  basename.sim 

Where  the  -p  switch  directs  Powest  to  print  out  in-formative 
data  about  the  circuit,  and  the  <  is  the  Unix  backwards 
redirect,  which  directs  the  .sim  file  to  Powest.  Appendix  A 
shows  the  result  ot  a  Powest  analysis  of  the  five  input  data 
path  AND  gate.  Checking  the  Powest  result  can  also  serve  as 
a  check  on  the  accuracy  of  Mextra's  nodal  extraction.  For 
example,  from  Figure  3.2,  the  cifplot,  there  &r(s  eight 
depletion  pull  up  transistors  and  no  enhancement  pullu.ps  or 
special  transistors.  The  Powest  analysis  in  Appendix  A 
confirms  this  count.  This  transistor  count  verification  is 
important  in  a  MacPitts  data  path  design  analysis.  It  has 
been  observed  that  the  Vdd  bus  (top  metal  trace,  Figure  3.2) 
does  not  always  connect  with  the  vertical  lines  to  the 
pull  up  transistors.  The  gap  is  so  small  that  it  is  not 
usually  evident,  in  Caesar,  although  a  design  rule  checker 
such  as  Lyra  will  detect  it. 

B.   CONTROL  PATH  FIVE  INPUT  AND  GATE 

Chapter  II  discussed  the  two  different  types  of 
control  path  five  input  AND  gates  possible.  The  COND  driven 
AND  gate  was  structurally  more  complicated  (Figure  2.40), 
while  the  "CONDI  ess"  AND  gate  was  comparatively  simple 
(Figure  2.43).  The  COND  driven  AND  gate  is  more  likely  to 
occur     in  practice  (since  the  purpose  of  the  Weinberger  array 


is  decision  making,  or  conditional  control),  so  that  circuit 
is  analysed  in  this  section. 

Figure  3.6  is  the  MacPitts  driver  which  creates  the 
control  path  to  implement  this  logic.  Figure  3.7  is  the 
resulting  Weinberger  array,  which  has  had  the 
odd_parti al  _gate  input  gap  errors  repaired  in  Caesar  (so 
Lyra  and  hence  Mextra  will  work,  and  produce  a  valid  . sim 
•file).  Figure  2.41  is  the  logic  gate  equivalent  of  the 
Weinberger  array.  Appendix  A  contains  the  object  -file  tor 
this  chip.  The  NOR  character  of  the  Weinberger  array  logic 
was  discussed  in  Chapter  II,  and  in  the  LISP  object  file  all 
logic  is  done  with  NORs.  Appendix  A  also  contains  the  LISP 
object  file  for  the  equivalent  data  path  function,  and  in 
Figure  3.2  all  logic  is  implemented  in  AND  organelles.  The 
Weinberger  array  is  composed  of  inverters  also,  but  an  NMOS 
technology  inverter  is  just  a  degenerate  (single  input)  NOR 
gate.  The  difference  in  implementation  from  a  software 
(language)  perspective  is  that  the  data  path  function  is 
done  in  organelles,  and  the  control  path  function  is  done 
exclusively  in  NORs.  The  data  path  organelles  are  already 
compiled  in  the  org an el  1 es. 1 i sp  files,  so  MacPitts  has  to 
work  harder  to  create  the  equivalent  function  in  the  control 
path.  Both  the  basename.obj  file  and  the  citplot  of  the 
Weinberger  array  show  the  NOR  logic  implicit  in  control  path 
combinational  logic.  The  MacPitts  script  file  is  shown  in 
Appendix   A,   and   its   data  path  counterpart   is   also   for 
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Figure    3.5    Gate    Equivalent    o-f    D.P.     AND    Showing    Delays 
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Figure    3.6    Control     Path    Five    Input    AND    Gate    .mac    File 
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comparison.   These   -files  contain  information  which  will   be 
compared  in  the  next  section. 

The  same  CAD  tools  were  used  on  this  circuit  as  were 
used  on  the  data  path  circuit,  in  the  same  order.  Mextra 
produces  the  .nodes  -file  (Appendix  A).  The  control  path 
logic  also  di-f-fers  -from  the  data  path  logic  in  the  number  of 
nodes  produced  to  model  the  equivalent  circuit.  The 
Weinberger  array  node  list  is  approximately  25"/.  larger  than 
the  equivalent  data  path  node  list.  Appendix  A  contains  the 
Crystal  analysis  o-f  the  circuit,  and  the  critical  path  -file 
-for  source  input  to  the  Caesar  -file.  Figure  3.8  depicts  the 
Weinberger  Array  with  the  critical  nodes  marked,  and  Figure 
3.9  is  the  gate  level  equivalent  o-f  Figure  3.8  with  delay 
node  values  and  gate  equivalent  -fan  — ins  marked.  Appendix  A 
contains  the  Powest  analysis  o-f  the  control  path  AND  gate, 
and  this  information  is  incorporated  into  the  following 
table  for  comparison. 

C.   SPEED-POWER  COMPARISON 

Table  3.1  compares  functionally  equivalent  MacPitts  five 
input  AND  gates  in  both  their  control  and  data  path 
conf l qurat l ons. 
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Figure  3.7  Weinberger  Array  From  C.P.  Five  Input  AND  Gate 


100 


Figure  3.8  Weinberger  Array  With  Critical  Nodes  Marked 
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Figure  3.9  Gate  Equivalent  of  Weinberger  Array  With  Delays 
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TABLE  3. 1 

FIVE  I  NPU.T  AND  GATE 

DATA  PATH  CONTROL  PATH 

MacPitts  power 

CW3  .0407  .0381 

F'owest  power 

average, CW 1        .00182  .00094 

max i  mum , C  W ]   . 00245  . 00 1 88 

Maximum  delay 

Crystal , Ens]  81.15  85.98 

Length  x  width  o-f  logic  circuit 

[lambda]      209  x  173  386  x  113 

Number  pull ups 

(less  pads)     12  8 

Compi 1 e  ti  me 

CCPU  min]     2. 106  1. 535 


CPU  peak  memory  demand 
Ckb]  349 


So  all  other  things  being  equal,  the  data  path  circuit 
is  superior  to  the  control  path  circuit  in  terms  of  power 
consumption,  size,  and  compile  time  in  MacPitts,  and 
slightly  interior  in  terms  o-f  maximum  speed  attainable. 

The  data  path  power  advantage  is  understandable  when  the 
number  o-f  depletion  pull  ups  there  is  compared  to  the  number- 
in  the  control  path.  A  power  consumption  ratio  ot  0.67  is 
expected,   and   the  calculated  ratio  is  close  to   that.    The 


di  -f  f  erence  is  explained  by  the  long  horizontal  polysilicon 
runs  in  the  Weinberger  array,  which  have  a  comparatively 
high  specific  resistance  (ohms/square),  and  therefore 
consume  more  power..  The  first  row  in  the  table  above, 
MacPitts  computed  power,  is  calculated  on  the  whole  chip  and 
not  on  the  just  logic  circuitry.  This  value  shows  a  similar 
power  consumption  relationship,  but  the  poly  runs  connecting 
the  Weinberger  array  to  the  rest  of  the  circuit  consume 
additional  power  (the  rest  of  the  analysis  in  the  table 
above  is  done  on  just  the  logic  circuits,  and  not  on  the 
whol e  chip). 

The  speed  of  the  two  circuits  is  approximately  the  same. 
Figures  3.4  and  3.8  show  the  Crystal -generated  deiav  data  on 
the  data  path  and  control  path  circuits.  The  results  are 
perhaps  clearer  in  Figures  3.5  and  3.9,  the  logic  gate 
equivalents  of  the  cif plots.  In  the  data  path  (Figure  3.4) , 
the  signal  experiences  approximately  21  ns  delay  per 
organelle.  The  organelle  comprises  a  NAND  gate  and  an 
inverter  (Figure  2.  14)  .  From  the  gate  equivalent,  and  the 
Crystal  script  (Figure  3.7) ,  each  NAND  gate  induces  a  delay 
of  9.4  ns,  and  each  inverter  induces  a  delay  of  11.4  ns.  The 
circuit  shown  in  the  gate  equivalent  is  expected  to  produce 
a  delay  equal  to  the  product  of  the  number  of  organelles  and 
the  delay  per  organelle.  The  expected  delay  is  then  4  x  20.8 
=  83.3  ns.  The  cif plot  (Figure  3.2)  reveals  where  the  added 
three  ns  delay  arises.  The  river  routing  routine  in  MacPitts 


runs  the  input  and  output  lines  in  polysilicon,  and  in  this 
case  the  output  comes  -from  across  the  circuit.  The  specific 
resistance  and  capacitance  of  polysilicon  and  the  poly  input 
and  output  line  lengths  constitute  this  added  delay.  Froede 
CRe-f.  3:pp.  72— 76D  has  validated  Crystal  s  timing 
calculations  and  compared  them  tor  accuracy  with  the  theory 
presented  in  Mead  and  Conway  CRe-f.  4:  pp.  3-143. 

Figure  3.8  is  the  corresponding  data  path  ci-fplot  with 
Crystal  delay  annotation  -for  the  Weinberger  array.  The 
structure  o-f  the  Weinberger  array  is,  at  -first  glance, 
intimidating.  Two  observations  on  -function  assist  in 
understanding  the  structure.  (1)  Any  BND  track  that  connects 
a  Vdd  track  with  only  one  diffusion  gate  is  an  inverter,  and 
(2)  any  GND  track  that  connects  a  Vdd  track  with  multiple 
diffusion  gates  is  a  multiple  input  NOR  gate.  The  transverse 
poly  runs  turn  on  and  turn  off  the  NOR  gates  and  inverters. 
This  cifplot  shows  six  inverters  and  six  NOR  gates. 
Furthermore,  multiple  input /single  output  Weinberger  arrays 
appear  to  always  exhibit  the  four  level  structure  shown  in 
Figure  3.9,  a  bank  of  inverters  followed  by  a  bank  of 
multiple  input  NORS  followed  by  a  single  multiple  input.  NOR 
followed  by  an  output  inverter.  Figure  3.9  is  the  gate  level 
equivalent  of  the  Weinberger  array  in  Figure  3.8,  with  delay 
annotation  and  fan-in  (shown  inside  the  bodies).  The 
critical  path  is  from  input  A  to  the  second  level  nine-input 
NOR  througn  the  output  NOR  through  the  output  inverter.   The 


Weinberger  array  total  delay  is  then  81.15  ns,  not  much 
different  from  the  data  path  circuit  delay.  This  delay 
calculation  only  considered  the  Weinberger  array,  however, 
and  not  the  connections  to  it  which  MacPitts  creates  in 
polysilicon.  If  these  additional  connections  were 
considered,  the  Weinberger  array  would  certainly  be  slower 
than  the  equivalent  structure  in  the  data  path.  Figure  3.8 
shows  the  critical  path  (annotated  with  cumulative  delay 
times),  and  it  is  evident  that  the  longest  delay  path  occurs 
along  the  wires  which  must  charge  the  largest  capacitances,, 
The  data  path  block  is  connected  to  the  rest  of  the  chip 
with  metal  lines  (in  most  cases) ,  so  this  added  delay  from 
polysilicon  runs  would  not  apply  to  it. 

The  relative  sizes  of  the  data  path  and  control  path 
circuitry  are  as  expected  from  the  object  code  respective 
descriptions.  The  object  code  for  the  data  path 
instantiation  is  approximately  half  the  size  of  the  code  tor 
the  control  path.  From  a  theoretical  viewpoint,  the  cascaded 
AND  organelle  circuit  is  more  conservative  of  both  silicon 
and  power  than  is  the  Weinberger  array.  This  principle 
applies  to  most  combinational  logic  in  MacPitts,  since  the 
Weinberger  array  builds  functions  only  from  NOR  gates, 
whereas  in  the  data  path  the  choice  of  building  blocks  is 
larger  (NAND,  NOR,  and  inverter).  The  MacPitts  chip  size 
comparison  is  given  in  the  table  above,  but  the  circuit 
dimensions  are    more  informative.  The  data  path  circuitry  has 


an  Ares,  of  .090  square  mm,  and  the  Weinberger  array  covers 
.109  square  mm,  an  area  o-f  120  7.  over  the  data  path 
functional  equivalent. 

The  compile  time  -for  the  control  path  chip  is 
approximately  257.  greater  than  -for  the  data  path  chip-  This 
is  understandable  in  light  o-f  the  gate  instantiation  process 
■for  each  path.  From  the  cif plots  in  Figure  3.2  (data  path.) 
and  Figure  3.7  (control  path)  ,  the  circuits  ar<^  not  even 
remotely  similar  structurally.  The  data  path  circuit  is  made 
•from  quadruple  instantiation  o-f  the  MacFitts  library  AND 
organelle  (see  Appendix  A,  the  object  code).  This  organelle 
is  accessed  four  times,  its  location  calculated,  and  then 
it  is  instantiated.  The  control  path  Weinberger  -:B.rra\' 
(Figure  3.7)  requires  time  consuming  decisions  and 
construction  from  more  primitive  units,  NOR  gate  inputs  (see 
the  object  code,  Appendix  A).  The  poly  cross-runs  must  then 
be  laid  down.  All  of  these  processes  are  computationally 
intensive,  and  this  is  why  large  control —heavy  Weinberger 
array  architectures  take  a  long  time  to  compile.  Chapter  VI 
describes  the  design  of  a  control  path  chip  and  how  long  it 
required  for  compilation. 

D.   ALTERNATE  POSSIBILITIES  FOR  FIVE  INPUT  AND  GATES 

The  five  input  AND  gate,  as  implemented  by  MacPitts  in 
both  its  data  path  and  control  path  configurations,  has  been 
examined   above.   Each  configuration  can  be  improved  in   the 


areas  of  speed  and  circuit  density.  While  the  goal  of 
silicon  compilation  is  to  -free  the  designer  from  excessive 
preoccupation  with  detail,  perhaps  the  combinational  logic 
generation  by  MacPitts  can  be  improved.  The  following 
section  presents  two  hand-designed  variants  of  the  five 
input  AND  gate  for  comparison  with  the  MacPitts  designs. 

The  first  design  is  patterned  after  the  Mead— Conway 
cells  as  illustrated  throughout  CRef.  43.  The  layout  is 
similar  to  that  generated  by  MacPitts  for  the  five  input 
data  path  AND  gate,  a  linear  cascade  of  NANDs  and  inverters. 
Figure  3.10  shows  the  hand-crafted  circuit.  It  is  noticeably 
different  from  the  MacPitts  design  in  two  ways.  The  pulldown 
transistors  on  the  NAND  gates  are  four  lambda  wide.  This 
allows  a  shorter  data  path,  while  preserving  the  4:1  aspect 
ratios  of  the  transistors.  Also,  the  characteristic  MacPitts 
pull  up  diffusion  "dogleg"  is  absent.  This  is  accomplished  by- 
joining  the  pull  up  diffusion  and  polysilicon  layers  with  an 
in-line  buried  contact.  The  circuit  is  also  less  wide  than 
the  MacPitts  equivalent.  MacPitts  uses  NAND  organelles,  and 
interconnects  then  with  metal /pol y/di ffusi  on  wires.  This 
wastes  a  lot  of  space.  In  the  hand-designed  five  input  AND 
gate,  the  output  is  taken  from  the  pull  up  on  a  polysilicon 
wire,  and  routed  directly  to  the  input  of  the  next 
transistor.  This  saves  (at  a  minimum)  two  contact,  cuts  in 
the  transistor  interconnections.  As  expected,  this 
configuration   is  also  considerably  faster  than  the  MacPitts 


equivalent.  The  MacPitts  data  path  -Five  input  AND  gate 
requires  86  ns  -for  signal  propagation,  and  the  handcrafted 
design  requires  22  ns.  Figure  3.11  shows  the  gate  equivalent 
of  the  hand  design,  with  propagation  times  marked  above  the 
respective  gates. 

This  con-figuration  is  amenable  to  silicon  compilation  if 
the  NAND-NOT  pairs  as  shown  are  incorporated  into  the 
MacPitts  organelle  library  as  an  AND  organelle.  Similar 
speed  and  area  enhancements  are  expected  -for  other  data  path 
logic  gates. 

I-f  the  multiple  input  AND  gate  can  be  improved  so  much 
using  the  basic  MacPitts  data  path  cascading  scheme,  does  a 
better  method  exist  using  another  approach?  The  drawback  to 
the  cascading  scheme  is  the  linear  pileup  o-f  transistors. 
This  requires  more  silicon,  and  consequently  more  current  to 
charge  the  gates  of  later  stages.  A  better  design  would  use 
only  one  gate  for  the  five  input  AND  function,  as  shown  • in 
Figure  3.12.  This  is  a  true  five  input  AND  gate,  as  opposed 
to  the  previous  circuits  which  only  emulate  the  five  input 
AND  function.  The  circuit  is  much  smaller  than  the  previous 
five  input  AND  gates,  and  is  much  faster.  Figure  3.1.3  is  the 
q a t  e  e q u ivalent  with  mar  k e d  d e  1  a y s  .  T h  l  s  c  i  r  c u :i.  t  i.  s 
patterned  after  those  circuits  illustrated  in  CRef.  4  3  also. 
The  wide  (10  lambda)  pulldown  region  permits  a  comparatively 
short  transistor  (i.  e.  ,  the  pull  up  aspect  ratio  is  not  very 
large).    The   multiple   input   NAND   and   NOR    derivatives 


Figure  3.10  Mead-Conway  Style  Five  Input  Linear  AND 
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patterned  after  this  gate  should  be  simple  to  incorporate 
into  the  silicon  compiler.  The  only  decisions  required  are 
how  many  inputs  (set  by  the  designer),  spacing  of  the  input 
wires  (set  by  the  design  rules),  and  pulldown  diffusion 
column  width  (must  be  calculated  as  a  -function  of  number  o-f 
input  wires  to  the  gate).   I-f  a  silicon  compiler  is   desired 
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Figure  3.11  Gate  Equivalent  of  Figure  3.10  With  Delays 

which  produces  fast,  compact  combinational  logic  circuitry, 
this  method  should  be  considered.  Table  3.2  compares  the 
data  path  AND  gate  (DP) ,  the  control  path  AND  gate  (CP) ,  the 
hand-crafted  linear  cascaded  AND  gate  (LC) ,  and  the 
multiple-input  AND  gate  (MI). 


11  1 


;>;•:  '-.-. 


;;!;;;.;S;t::' 


Figure  3.12  Compact  Five  Input  AND  Gate 
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Figure  3.13  Gate  Equivalent  o-f  Optimal  Geometry  Five 
Input  AND  Gate  Showing  Delays 


TABLE  3.2 
COMPARISON  OF  FIVE  INPUT  AND  GATES 


size 
Cmm**2D 

pul lups 

max.  pwr. 
CmWD 

prop,  delay 

[ns] 


DP 

09 
12 


CP 

.  109 
8 


81.15     85.98 


LC 

.01 
12 

2.0 

21.97 


MI 

004 
1 

.5 

6 .  05 
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IV.   SEQUENTIAL  LOGIC  IN  MACPITTS 

Based  on  previous  analysis,  combinational  logic  in 
MacPitts  is  done  better  (i.e. ,  more  efficiently,  when  a 
choice  exists)  in  the  data  path  than  in  the  control  path. 
Does  the  possibility  ot  improving  MacPitts'  sequential  logic 
performance  exit  also?  A  study  of  this  question  presents 
interesting  problems. 

A.   AN  OVERVIEW 

Chapter  II  discussed  two  different  ways  of  increasing 
throughput,  the  PAR  form  and  the  COND  form.  There  exists 
also  a  method  of  global  parallelism  available  to  the 
MacPitts  programmer,  the  PROCESS  form.  The  PROCESS  form  has 
the  syntax 

(process   -(process  name)   <stack  depth>  ...   ) 

where  the  process  name  is  an  arbitrary  ASCII.  character 
string  ( i f  the  name  is  made  short,  then  the  VT-100/ADM-3A 
interpreter  screen  can  display  them  all).  The  stack  depth 
refers  to  the  depth  of  subroutine  calls  for  which  this 
process  must  push  return  addresses  onto  its  program 
counter  LIFO  stack.  MacPitts  syntax  requires  the  designer 
to  determine  this  stack  depth  a  priori,  and  to  explicitly 
state  it  to  MacPitts  (the  silicon  compiler).  The  stack 
depth   is  a  required  field  in  the  PROCESS   statement,   and 


may  be  any  integer  including  zero.  Each  process  has  its 
own  stack,  and  all  processes  are  executed  in  parallel. 
This  parallelism  provides  a  high  throughput  on  a  properly 
designed  algorithm. 

An  extension  o-f  the  digital  home  temperature  controller 
o-f  Chapter  II  might  also  control  other  aspects  of  the  home 
environment.  For  instance,  it  would  be  desirable  to  turn  tne 
security  lights  on  and  off  by  a  photoelectric  cell  signal, 
to  start  the  coffee  brewing  and  the  microwave  oven  cooking 
dinner  at  a  timer  signal,  and  to  keep  the  lawn' appropriately 
watered  by  turning  the  sprinkler  on  upon  a  moisture  detector 
signal.  The  following  MacPitts  program  outline  would 
accomplish  these  tasks.  All  logic  is  done  on  Boolean 
variables,   flags  for  storage  and  signals  for  sensor  inputs. 

(program  house   (word  size) 

(port , si gnal , regi ster , and  flag  assignments) 

(process  lite   0 

(setg  lights   (not   photo_cel 1 _i nput ) ) 
(process  food   0 
(cond 

(si K_am 

(setq  mr coffee  t)  ) 
( seven _am 

(setq  mrcoffee  f>) 
(f our45_pm 

(setq  put  __di  nner  __i  n  t)) 
( f  i  ve_pm 

(setq  microwaveon  t)) 
(f  i  ve30__pm 

(setq  mi  crowave__on  f))  ) 

(process  environ  0 
(cond 


i  I 


(hot 

(setq  fan_on  t) 

(setq  window_open  t) 

(setq  doors_open   t)) 
(cold 

(setq  heater_on  t) 

(setq  window_open  f ) 

(setq  doors_open  -f  )  ) 
(t 

(setq  heater_on  f) 

(setq  fan_on  f)) 

(setq  window_open  t) 

(setq  doors_open  t))  ) 

(process  grass   0 

(setq  sprinkler_on   (not  lawn_moist))  ) 

(process  clock   1 

(par  (call  mod  60)  (setq  time  counter__out )  )  ) 

mod  60 

<a    modulo   sixty   up   counter    al gori thm> (return ) ) 


All  of  these  processes  are  done  in  parallel.  All  at  the 
processes  have  a  stack  depth  of  zero  except  for  the  clock 
process,  which  has  a  stack  depth  of  one.  This  is  necessary 
due  to  the  clock  process  calling  a  subroutine,  the  modulo 
sixty  up  counter.  The  call  of  the  counter  and  the  following 
SETQ  are  paralleled  with  the  PAR  construct.  This  PAR 
paralleling  appears  to  work  well  for  cases  where  the  output 
depends  on  the  called  routine,  like  the  example  above.  If 
the  dependency  is  reversed  (for  instance,  paralleling  SETQs 
of  inputs  to  a  slow  multiplier  subroutine  with  the  CALL  to 
that  multiplier)  some  unpredictable  results  can  arise.  A 
good  practice  is  to  emulate  all  time-dependent  algorithms 
alone   in  the  interpreter  prior  to  their  incorporation   into 


the  MacPitts  algorithm.  In  so  doing,  syntax  errors  may  be 
•found  and  -fixed  and  the  algorithm  may  be  optimized  -for 
number  o-f  cycles  required  to  execute. 

For  -fast  architectures,  some  additional  speed  can  be 
gained  by  paralleling  the  subroutine  outputs  with  the 
RETURN  -from  the  subroutine.  For  instance,  the  mod60 
counter-timer  in  the  previous  example  is  called  as  a 
subrouti  ne. 
mod  60 

(par(setq  counter_put  count) (return)) 

There  exists  no  time-dependency  between  the  -final  result 
(counter_put )  and  the  RETURN  to  the  main  program,  so  no 
data  latency  results  -from  this  paralleling. 

To  re-emphasize,  all  o-f  the  PROCESSes  under  the  PROGRAM 
statement  execute  in  parallel.  So  while  the  <house>  chip  is 
monitoring  temperature  and  time,  it  is  simultaneously 
monitoring  lawn  moisture,  setting  the  house  clock,  and 
checking  the  outside  light  level.  PROCESSes  execute 
independently,  in  parallel.  Each  PROCESS  has  its  own 
independent  stack,  and  processes  do  not  communicate 
internally  with  each  other.  From  the  hardware  standpoint, 
each  process  is  an  independent  MacPitts  entity  sharing  data 
storage  elements  and  signal  wires. 
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In  this  somewhat  artificial  example,  there  is  no  strict 
requirement  for  speed.  If  the  lawn  is  watered  50 
microseconds  late,  the  grass  will  still  grow.  But  the 
principle  o-f  global  process  parallelism  applies  to  more 
complicated  digital  systems  where  intricate  timing 
interrelationships  exist.  It  is  also  evident  that  MacPitts 
is  a  very  versatile  silicon  compiler.  A  chip  constructed 
■from  a  similar  multi-process  algorithm  could  be  used  to 
control  many  o-f-f—  chip  processes  simultaneously.  The 
intrinsic  nature  o-f  the  PROCESS  -form  lends  itself  well  to 
applications  such  as  industrial  digital  control.  In 
situations  where  the  PROCESS  statement  is  used  to  force 
parallelism  but  the  parallelism  is  not  needed  (-for  instance, 
the  <house>  algorithm),  MacPitts  creates  a  large  layout. 
Silicon  area  is  traded  o-f-f  -for  speed. 

This  algorithmic  outline  illustrates  using  PROCESSes  in 
a  combinational  logic  machine.  PROCESSes  are  required  around 
any  invocation  o-f  a  subroutine,  but  aside  -from  this 
consideration,  the  -(house)  chip  could  be  specified  just  as 
well  without  PROCESSes. 

PROCESSes  are  required,  however,  to  describe  a 
sequential  logic  machine  in  MacPitts.  The  FSM  architecture 
is  explicitly  specified  by  the  PROCESS  form.  The  PROCESS 
statement  implicitly  specifies  creation  sequencers  (a  data 
path  hardware  organelle,  which  steps  the  FSM  through  its 
states)  and  their  instantiation  in  the  data  path. 
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B.   GRAY  CODE  TO  BINARY  DECODER 

The  -Following  section  illustrates  the  MacPitts  design  oi 

a   simple   sequential  logic  system.   The  Gray   code   CRet.5: 

p. 97]   -finds  many  diverse  uses  in  electrical  engineering  and 

computer  science.  Whenever  a  single  bit  change  in  successive 

data   words   is   desired,    (disk   sector   addressing,  radar 

antenna  positioning)  the  Gray  code  should  be  considered.   In 

-finite   automata   theory,   the   Gray   code   decoder   can   be 

regarded   as   a  sequence  detector.   The   desired   sequential 

machine   complements   the   input  on  having  received   an   odd 

number  of  earlier  l's,   and  does  not  complement  the  input  on 

an  even  number  o-f  l's.  An  example  sequence  is 

input:        11110    0    0    0    1    0    1    0    1     10    0    1     ... 
output:     -    0    1    0    0    0    0    0    1     1    0    0    1    0    0    0    1     ... 

The  Gray  code  decoder  can  be  implemented  in  MacPitts  as 
a  Mealy  FSM  to  detect  this  sequence,  and  set  the  appropriate 
outputs.  The  automata  -for  the  Gray  code  decoder  is  shown  in 
Figure  4.1.  The  node  label  MSBS  indicates  most  significant 
bits,  COMPL  means  complement  the  present  bit,  and  NEXTBIT 
means  consider  the  next  bit. 

1 .   Al qor l thm  Desi qn 

The  next  consideration  is  algorithm  design.  Previous 
experience  inclines  the  designer  toward  a  data  path 
architecture  (faster,  smaller,  less  power  consumption). 
Furthermore,  a  data  path  chip  would  probably  have  a  greater 
throughput,  since  the  operations  could  be  done  on  words,  and 
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Figure  4.1  Gray  Code  Decoder  State  Transition  Diagram 
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not   individual  bits  (e.   g.,   a  parallel  Gray  code  decoder, 
which   decodes   on   a  word  basis  rather   than   a   bit-by-bit 

basi  s) . 

The  problem  with  this  approach  is  that  MacPitts 
permits  no  explicit,  succinct  method  of  setting  the 
individual  bits  in  a  word.  The  bits  can  be  tested  with  the 
BIT  expression,  but  not  set.  So  a  control  path  (implying 
Boolean  type  data  and  Weinberger  array  combinational  logic) 
architecture  is  probably  a  better  choice. 

A  control  path  FSM  can  be  designed  with  MacPitts 
(even  though  no  explicit  data  path  is  used).  The  reason  is 
the  way  in  which  MacPitts  implements  FSM  state 
transi tioni ong  with  the  sequencer  organelles.  The  sequencer 
can  be  thought  o-f  as  a  bank  of  n  sequencer  organelles,  where 
n  is  the  data  path  width  speci-fied  in  the  PROGRAM  statement. 
The  sequencer  organelles  are  physically  adjoined  to  the  data 
path  organelles  in  the  MacPitts  chip.  The  sequencer  stores 
FSM  state,  much  in  the  same  way  as  flip-flops  store  state  in 
a  di screte— chi p  FSM  design.  And  just  as  two  raised  to  the 
power  (number  o-f  flip-flops)  limits  the  states  in  a  discrete 
digital  system,  so  two  raised  to  (number  of  sequencers) 
limits  the  states  possible  in  a  MacPitts  sequential  machine. 
The  number  of  sequencers  is  always  equal  to  n,  the  data  path 
width.  This  has  ramifications  for  MacPitts  designers 
considering  a  system  of  many  states  with  a  narrow  data  path. 
The  possible  number  of  states  is  limited  to  2**n. 
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One   solution  to  the  Gray  code   problem  is  to  use  a 

data   path  architecture,   to  declare  the  data  path  width   as 

two,   and  to  specify  an  extra  (unused)  bit  in  the  input   and 

output   port  declaration  statements.    The  most   significant 

bit  of  the  input  port  is  obviously  extraneous,   but  the  data 

path   width  o-f  two  is  necessary  to  address  the  three   states 

required   (Figure   4.1).   When  the  Gray  code  chip   is   used, 

these   extra   pins  must  be  tied  to  ground.   If  a   data   path 

width   of   one  is  specified  (and  PORTS  are    used  for   inputs) 

instead,  MacPitts  gives  the  following  diagnostic 

Error-Word   length  too  small  to  store  the   state   for 
this   process 

If  the   data  path  width  is  left  as  two,  but  the  input   and 
output   ports  are    left  only  one  bit  wide  (another   attempt 
to  circumvent  this  problem),  MacPitts  responds  with 
Error-Invalid  port  definition 

which  means  that  the  data  path  width  was  declared  as  two, 
but  the  port  is  only  one  bit  wide  (MacPitts  has  helpful 
diagnostics).  The  MacPitts  source  code  file,  extract. lisp 
(under  the  def  get-sequencer-f rom-process  macro)  shows  why 
this  constraint  exists.  The  sequencer  width  is  explicity 
set  to  the  data  path  width. 

Figure  4.2  shows  the  MacPitts  driver  code  to  do  the 
Gray  code  to  binary  conversion  serially.  The  MacPitts 
algorithm   shown   in  Figure  4.2  has  the  lines   numbered   for 
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reference,  but  the  numbers  are  not  part  o-f  the  allowed 
MacPitts  syntax.  Line  1  is  the  title,  using  a  semicolon  as 
the  reserved  word  comment  designator.  Line  2  is  the  PROGRAM 
statement,   the   program  name  is  gc  (Sray  code)  and  the  data 


1  ;Grey  Code  to  binary  conversion  algorithm 
;Th1s  code  Illustrates  the  Data  Path  (1.  e., 
;Integer)  solution  to  the  problem.  It  Is  but  one 
;Var1ant  of  many  possible  solutions. 

;Def1ne  the  data  path  width  as  2  (state  transitioning) 

2  (program  gc  2 

3  (def  1  ground) 

4  (def  2  phla) 

5  (def  3  phlb) 

6  (def  4  phic  > 

;A11  FSMs  must  have  a  RESET  Input  (for  Initialization) 

7  (def  reset  signal  input  5) 

;Use  INTEGER  (port)  Input  &  output,  2  bits  wide 

8  (def  Inp  port  Input  (6  7)> 

9  (def  bin  port  output  (8  9)) 

10  (def  10   power) 

;Spec1fy  FSM  architecture 

11  (process  grycod   0 

12  msbs    ;   (Most  Significant  Bits) 

13  <cond((=0  1np)(setq  bin  0)(go  msbs)) 

14  <(=  1  InpHsetq  bin  1  )  ( go  compl))) 

15  compl   ;   (COMPLement  bits) 

16  (cond((=0  1 np  )  ( setq  bin  l)(go  compl)) 

17  ((=     1     InpHsetq    bin    0)(go    nextblt))) 

18  nextblt;        (NEXTBIT    in    string) 

19  (cond((=0    InpHsetq    bin    0Xgo    nextblt)) 

20  ((=     1     InpHsetq    bin    lHgo    compl)))        )     ) 


Figure  4.2  Gc.mac 

path  width  is  two.  Lines  3,  4,  5,  6 ,  and  10  are  standard, 
and  required  by  MacPitts  conventions.  Line  7  is  required  for 
all  FSMs,  and  when  it  is  raised  high  (positive  logic 
arbitrarily  chosen  here),  the  FSM/PROCESS  is  reset  to  its 
initial   state.   Line   8  de-fines  the  input   port,   inp,   and 


123 


declares  it  integer  two  bits  wide.  Line  9  does  the  same  -for 
the  output  port,  bin  (binary  value).  Line  11  specifies  FSM 
architecture  with  the  PROCESS  statement,  -for  which  the  stack 
depth  is  zero  (no  calls  to  subroutines).  Line  12  is  a  node 
label,  msbs  (most  significant  bits),  and  represents  the  top 
node  in  Figure  4.1.  Line  13  is  the  -first  check  in  this 
state,  and  says  that  if  the  input  equals  zero,  then  set  the 
output  to  zero  and  go  to  node  msbs.  If  the  input  does  not 
equal  zero,  then  go  to  the  next  line  of  code.  Line  14  checks 
whether  the  input  equals  one.  If  the  input  is  equal  to  one, 
the  output  is  set  to  one,  and  the  program  transitions  to 
the  complement  (compl)  state.  Line  15  implements  the  second 
node  in  Figure  4.1,  complementing  the  input.  Line  1 o  checks 
the  input,  and  if  it  equals  zero  it  complements  and  keeps- 
complementing  as  long  as  the  input  equals  zero,  and  if  not, 
it  proceeds  to  the  next  line.  Line  17  checks  for  the 
sequence  of  an  even  number  of  ones,  and  if  true,  sequences 
to  the  next  node  after  complementing  the  input.  Line  IS  is 
the  label  corresponding  to  the  last  node  in  Figure  4. 1 , 
nextbit.  Line  19  checks  the  input  bits,  sets  the  output  to 
the  input  value,  and  returns  to  this  node  as  long  as  the 
input  is  zero.  Line  20  also  sets  the  output  to  the  input 
value,  but  jumps  back  to  the  bit  complement  node  when  the 
input  is  one.  The  conditional  in  line  17  is  unnecessary,  taut 
is  included  for  clarity  (If  the  n on -storage  port,  tain,  is 
not  explicitly  set  to  one,   it  will  become  zero  at  the   next 
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state  transition.  Line  17  can  be  eliminated,  and  the 
algorithm  will  work  correctly  anyway). 

The  next  step  is  to  test  and  debug  the  algorithm  in 
the  interpreter  prior  to  full  compilation.  The  Gray  code 
algorithm  was  debugged  in  the  interpreter,  and  compiled  with 
the  <herald>  option.  Appendix  B  shows  the  script  recording 
of  the  compilation  process,  and  indicates  a  data  path  of 
seven  different  organelles  (to  be  discussed  in  the  next 
section)  and  a  moderate— si  zed  (31  columns)  Weinberger  array. 

Figure   4.3   shows   the   chip   resulting   from   the 
compilation   of  gc.mac.   The  functional  constituents  o-f  this 
layout  will  be  treated  gual i tat i vel y  in  the  next  section. 
2.    Functi  onal  Const i  tuents  Of  The  Chi  p 

The  layout  scheme  of  MacPitts  places  general 
functional  blocks  in  specific  relative  locations  on  the 
chip.  Figure  4.4  indicates  where  these  relative  locations 
lie  on  the  cifplot.  The  block  sizes  shown  in  Figure  4.4  are? 
arbitrary,  since  the  actual  sizes  depend  on  a  combination 
of  algorithm  and  MacPitts  (the  source  code).  In  comparing 
Figure  4.4  to  Figure  4.3,  it  is  seen  that  this  chip  has  no 
flags,  which  is  expected  since  none  are  defined  in  the 
source  algorithm.  The  rest  of  the  blocks  shown  in  Figure  4.4 
ar&    instantiated  in  cg.cif  (Figure  4.3). 

The   data  path  arithmetic  block  is  shown   in   Figure 
4.5.   The  function  of  this  unit  is  to   operate  on  the  inputs 
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Figure    4.3    Gc.ci-f 
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Figure  4.5  Data  Path  Arithmetic  Block  From  GC.ci-f 
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so  that  the  desired  outputs  result.  The  inputs  enter  the 
arithmetic  block  and  the  outputs  exit  as  shown  in  Figure 
4.5.  Between  input  and  output,  the  data  is  subject.  to 
switching  and  various  logic  operations.  The  data  path  and 
the  control  path  must  also  communicate  with  each  other  over 
the  interconnecting  traces.  The  leftmost  top  poly  line,  D9 , 
is  an  input  to  the  Weinberger  array,  where  it  turns  on  five 
NOR  gates.  Similarly,  the  other  nine  lines  also  connect  to 
the  control  path.  Lines  D8 ,  D7  ,  D5 ,  D4  ,  D3  (reset),  D2,  Di, 
and  DO  are  outputs  -from  the  control  path  and  inputs  to  the 
data  path.  Line  D6  is  the  other  output  from  the  data  path  to 
the  control  path.  The  inputs  to  the  data  path  can  oe 
understood  as  relay  controls,  or  switches.  The  outputs  from 
the  data  path  to  the  Weinberger  array  are  Boolean  values  to 
cause  decisions  about  what  to  do  next. 

From  Figure  4.5,  the  arithmetic  path  of  this  chip  is 
seen  to  be  two  bits  wide  (the  two  horizontal  parallel 
organelle  chains).  In  Chapter  II  it  was  shown  that  syntax 
implicitly  controls  instantiation.  Line  13  in  the  Gray  code 
algorithm  specifies  two  data  path  operations 

(cond( (=0  inp) (setq  bin  0) 

where  the  (=0  inp)  is  a  logical  comparison  integer  test,  and 

(setq   bin  0)  is  an  integer  form  by  definition  of  bin  in  the 

def  statement  and  the  source  for  bin  being  an  integer,  z&ro. 

The   leftmost  set  of  cascaded  OR  gates  makes  the   (=0   inp) 


test,  and  signals  the  control  path  on  line  D9.  Figure  4.6 
shows  the  logic  diagram  for  this  stipple  plot,  and  the 
results  -for  a  zero  input. 

Proceeding  .right  on  the  arithmetic  block  stipple 
plot,  the  next  block  is  a  set  of  paralleled  NOR  gates.  The 
inputs  are  the  inp  bits,  inpO  and  inpl,  and  Vdd  and  BND.  The 
output  is  a  signal  to  the  control  path  -from  08  which 
determines  the  chip  output,  bin  (BINary  equivalent  of  the 
Gray  code  bit  stream).  This  circuit  does  not  directly  make 
the  output  assignment,  (setq  bin  0),  but  rather  does  it 
through  combinational  logic  in  the  Weinberger  array.  Figure 
4.7  is  the  logic  diagram  of  the  setq  operation  circuitry. 
The  circuit  is  annotated  to  show  a  zero  bit  input  on  inp,  in 
which  case  a  TRUE  is  sent  to  the  control  path  on  line  DS. 

Proceeding  right  in  the  data  path,  the  next  two 
blocks  in  Figure  4.5  show  pass  transistor  units.  The 
leftmost  pass  transistor  unit  has  inputs  from  binO,  binl, 
and  control  on  07.  The  output  is  a  signal  to  control  on  D6. 
This  section  of  the  data  path  is  where  the  output  bin  is 
set,  although  the  logic  for  setting  bin  is  determined  in  the 
preceeding  two  data  path  units  and  the  control  path.  To  the 
right  of  this  unit  is  another  pass  transistor  block  which 
takes  inputs  from  the  previous  pass  transistor  unit,  trom 
the  clock  drivers,  from  control  on  lines  05,  D4 ,  and  03,  and 
from  the  sequencer.  The  function  of  this  unit  is  state 
transition.    The   sequencer   inputs   represent   the   current. 
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state,  and  this  unit  drives  the  state  registers  which  signal 
next  state  to  the  sequencer  tail,  at  -far  right.  The  input  D3 
is  the  reset  signal,  which  implements  the  MacPitts  -function 
o-f  returning  the  FSM  to  its  initial  state  when  raised  high. 

Figure  4.8  shows  the  state  registers,  a  set  of 
parallel  2— T  memory  cells,  in  which  the  current  state  is 
held.  The  inputs  to  the  state  registers  are  the  outputs  of 
the  previous  pass  transistor  block,  signalling  next  state 
transition,  and  the  three  clock  lines  -from  the  clock  driver. 
The  outputs  are  the  two  state  bits  (SO  and  SI)  to  the 
control  path  (on  lines  marked  CI  and  CO,  Figure  4.10),  The 
Mealy  FSM  methodology  is  evident  in  MacPitts  from  both  the 
algorithmic  and  hardware  viewpoints.  The  output  is  a 
function  of  both  input  (inpO,  inpl)  and  present  state  (SO, 
SI)  . 

Below  the  state  registers  in  Figure  4.3  are  the 
clock  drivers.  Figure  4.9  is  a  blowup  of  the  driver 
organelles,  used  for  buffering  the  clock  signals  and 
generating  the  five  overlapping  clock  signals.  The  drivers 
are  turned  on  by  a  signal  from  the  Weinberger  arrav^  C5„ 
Carlson  describes  the  clocking  scheme  and  the  reasons  behind 
its  choice  (Ref.  2: p.  26). 

The  rightmost.  block  in  the  data  path  is  the 
sequencer.  Figure  4.10  is  the  cifplot  of  the  sequencer 
combinational  logic,  and  Figure  4.11  is  its  gate  equivalent. 
The  sequencer  has  as  its  inputs  the  current  state   (SO,   SI) 
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Figure  4.8  MacPitts  State  Registers 
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Figure  4.9  Clock  Drivers  and  Five  Segment  Generator 
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and  produces  as  its  outputs  the  next  state  (SO+1 ,  Sl+1). 
The  gate  diagram  of  the  sequencer  answers  the  question  asked 
in  the  initial  design  o-f  the  Gray  code  decoder,  why  three 
states  are  not  allowed  in  a  control  path  (i.e. ,  a  data  path 
width  o-f  zero)  architecture.  The  answer  lies  in  the  implied 
data  path  structure,  as  explained  previously  and  as 
graphically  shown  in  Figure  4.10  and  Figure  4.11.  The  data 
path  width  as  specified  in  the  PROGRAM  statement  sets  the 
number  o-f  sequencers  to  be  instantiated,  and  the  number  of 
sequencers  limits  the  number  o-f  states  possible.  It  fewer 
FSM  states  are  required  than  the  sequencer  depth  can 
transition  to,  the  sequencers  are  nevertheless  instantiated, 
but  their  outputs  are  not  connected  to  the  control  path  (CO 
and  CI  in  this  example).  For  example,  this  would  occur  ior  a 
wide  data  path  which  had  few  states.  If  a  data  path  FSM  chip 
were  designed  with  a  word  length  of  five  ,  and  only  four 
FSM  states  were  needed,  MacPitts  would  instantiate  all  five 
of  the  sequencer  organelles.  Only  the  top  two  would  be 
connected  to  the  Weinberger  array.  Figure  4.12  is  a  block 
diagram  of  the  MacPitts  sequencer  organelles,  and  shows  now 
the  Mealy  FSM  is  implemented.  The  multiplexers  on  each  side 
of  the  state  registers  determine  that  the  next  state  is  a 
function  of  both  present  state  and  present  input.  The 
Weinberger  array  controls  the  gating  in  the  multiplexers  to 
allow  the  appropriate  signals  to  pass  to  the  state 
registers. 
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The  Weinberger  array  is  the  control  path  in  a 
MacPitts  chip,  -for  reasons  explained  in  Chapter  II.  The 
Weinberger  array  is  shown  in  Figure  4.13,  and  its  labelled 
gate  equivalent  is  Figure  4.14.  In  the  cifplot,  all  input 
and  output  columns  have  been  labelled  (A-Z)  tor  comparison. 
The  output  lines  have  also  been  labelled  (Cn)  for  reference 
to  the  other  -functional  blocks  o-f  the  chip.  There  ars  major 
di  f  f  erences  between  this  multiple  function  Weinberger  array 
and  the  single  function  Weinberger  arrays  considered 
previ  ousl y . 

This  Weinberger  array  for  single  output.  functions 
always  has  a  four  level  structure,  inverter— NOR— NOR— 
inverter.  This  is  not  the  case  for  multiple?  output 
Weinberger  arrays.  This  circuit,  has  11  inverters  and  15  NOP 
gates.  The  maximum  fan  in  on  any  NOR  gate  is  six.  In  the 
previous  Weinberger  arrays,  the  maximum  delay  was 
approximately  four  gate  delays.  In  this  Weinberger  array* 
the  longest  path  is  shown  in  Figure  4.14  as  J-U- T— L-F-G-D , 
or  Q-W-T-L— F— G-D ,  Each  path  induces  approximately  seven  gate 
delays.  The  MacPitts  script  session  (included  in  Appendix  B) 
lists  the  control  depth  (NOR  gate  nesting)  incorrectly  as 
four.  Furthermore,  the  pal ysil icon  runs  cover  proportionally 
more  area  in  this  Weinberger  array  than  in  the  previous 
single  function  ones.  From  Chapter  III,  the  polysiiicon  to 
substrate  capacitance  is  a  strong  factor  in  limiting  chip 
speed.   The  multiple  function  Weinberger  arrays  are;    expected 


Figure    4.13    Weinberger    Array    -from    Gc.ci-f 
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Figure  4.14  Gate  Equivalent  of  Gray  Code  Weinberger  Array 
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to  be  slow.  This  Weinberger  array  has  nine  outputs  (Cli, 
CIO,  C8,  C7,  C6,  C5,  C4 ,  C3 ,  and  C2)  and  -five  inputs  (C13, 
C12,  C9 ,  CI,  and  CO).  C13  is  a  check  on  the  input  signal 
values,  and  comes  -from  D9  (D  indicates  a  signal  to  or  from 
the  data  path,  C  indicates  a  signal  to  or  -from  the  control 
path).  Cll  is  an  output  to  D8  in  the  data  path,  the  -function 
ot  which  is  not  clear  (data  path  output  connecting  control 
path  output).  CIO  is  an  output  to  D7,  and  the  signal 
controls  pass  transistor  gating  in  the  left  pass  transistor 
unit,  which  determines  the  value  of  the  output  (binO,  binl). 
C9  is  an  input  to  the  Weinberger  array,  and  comes  from  D6. 
This  input  is  not  set  within  the  data  path,  and  it  is  likely 
that  it  results  -from  MacPitts'  expectations  of  a  more 
complicated  structure.  The  sequencer  organelles  exhibit  this 
vest i gal  structure  property  also,  as  previously  mentioned, 
C8 ,  C7,  and  C6  are  outputs  which  control  the  second  pass 
transistor  block  (state  register  multiplexer)  in  the  data 
path.  They  connect  to  D5 ,  D4 ,  and  D3 ,  respectively,  and 
control  the  sequencer's  next— state  transitioning.  C6  is 
connected  to  pin  -five  by  a  polysilicon  run  and  CI  3,  so  C6 
(D3)  is  the  reset  signal.  C5  is  an  output  which  turns  on  the 
clock  drivers.  C4 ,  C3 ,  and  C2  are  outputs  connecting  the 
data  path  at  D2 ,  Dl,  and  DO,  where  they  control  pass 
transistor  gating  for  the  sequencers  and  state  register.  CI 
and  CO  are  inputs  -from  the  state  register  which  represent 
the   current  state.   Figure  4.15  shows  the  data  path— control 


path  interconnections.   The  interconnections  Are       summarized 
in  the  diagram  below. 

(inp)  (bin)  PT   ves   PT:  state  PT:seq  and  reg 

rst    D?    D8    D7    D6   D5   D4   D3   elk   D2   Dl   DO  SI  SO 

C13    C12    Cll   CIO   C9   C8   C7   C6   C5   C4   C3   C2  CI  CO 

rst 


In   this   diagram,   rst  means  reset,   PT  is  a   pass 
transistor   unit,   ves  is  a  vestigal  (non-functional)   unit, 
seq  is  sequencer,  and  reg  is  the  state  register. 
3.   Al ternate  Desi  qns 

The   gc.mac  algorithm  used  explicit  value  assignment 
in  the  output  setq  -forms. 
( se t q  bin  -(value > ) 

In  this  case,  it  is  possible  to  explicitly  set  the  output 
to  a  value  (one  or  zero).  This  is  not  possible,  however, 
■for  all  algorithms,  and  is  not  even  desirable  in  the 
general  case.  Usually  the  output  is  a  -function  o-f  the 
input(s),  and  not  a  specific  value  which  is  known 
beforehand.  With  this  in  mind,  an  alternate  algorithm  was 
written  to  implement  the  Gray  code  to  binary  conversion. 
Figure  4.16  shows  the  algorithm,  gc2.mac.  This  code 
follows  the  state  diagram  given  for  gc.mac  (Figure  4.2) , 
and  the  states  all  have  the  same  names.  The  algorithms  ar<a 
equivalent  functionally  and  semantically  (they  both  do  and 
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Figure  4.15  Data  Path/Control  Path  Interconnection 
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say  the  same  thing).  The  only  di  f  f  erence  is  in  the  binary 
output  (bin)  setq  forms.  In  the  previous  algorithm, 
gc.mac,  the  output  bin  is  explicitly  set  to  either  one  or 
zero.  In  this  algorithm,  gc2.mac,  the  output  is  set  to  a 
data  path  -function  of  the  input.  The  code  represented  by 
gc2.mac  represents  the  more  general  case. 

The  chip  created  by  gcZ'.mac  is  expected  to  be  larger 
than  the  one  created  by  gc.mac,  since  additional  data  path 
decisions  are  required  in  the  setq  forms.  The  script  file  of 
the  gc2  MacPitts  session,  (Appendix  B)  verifies  this,  and 
Figure  4.16  shows  the  resulting  layout.  In  comparing  the  two 
script  files,  it  is  seen  that  gc2  requires  more  data  path 
units,  data  path  transistors,  and  control  path  transistors. 
This  is  reflected  in  the  comparative  complexities  of  the 
data  paths  in  Figure  4.3  and  Figure  4.16.  The  chip  produced 
by  gc2  would  also  consume  slightly  more  power  and  be3 
slightly  larger  than  the  chip  produced  by  gc.  The  conclusion 
is  that  by  explicitly  specifying  the  setq  destination 
values,  the  designer  can  save  area  and  power  consumption.  A 
reasonable  expectation  would  also  be  a  taster  chip. 

Explicit  assignment  of  outputs  is  therefore 
desirable,  though  not  always  feasible.  In  many  control  path 
architectures,  where  the  output  is  treated  as  individual 
bits,  explicit  assignment  is  possible  (though  not  always  the 
optimal  solution,  see  Chapter  VI  on  Hamming  error- 
correctors,   where  there  aro    many  outputs  possible).  In  data 
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path  or  hybrid  architectures  where  there  ^rs  only  a  few 
numerical  outputs  possible,  explicit  assignment  of  output 
values  should  also  be  considered  (see  the  blackjack 
algorithm,  -following).  A  general  rule  is  to  choose  the 
method  that  results  in  the  shorter  algorithm,  whether  (1) 
explicit  assignment  of  outputs,  or  (2)  assignment  of  outputs 
as  a  function  of  either  inputs  or  intermediate  values.  The 
significance  of  this  is  that  the  designer  can  influence  the 
design  by  the  MacPitts  program  written,  even  though  the 
silicon  compilation  process  is  automatic. 

The  two  previous  algorithms  assume  serial  decoding. 
If  it  is  desired  to  do  the  decoding  faster,  parallel 
decoding  should  be  considered.  MacPitts  has  a  mechanism  for 
this  implicit  in  the  integer  data  types  (which  look  at  a 
data  word  in  parallel),  and  the  multiple  PROCESS  algorithm, 
which  performs  independent  functions  in  parallel.  Parallel 
data  processing  will  be  considered  in  Chapter  VI . 

The  alternate  solution  (control  path  logic)  to  the 
Gray  code  decoder  is  shown  in  Appendix  B  for  comparison  to 
gc.mac  and  gc2.mac.  The  script  and  cif  files  are:  included 
also  for  comparison. 


;GRAY  CODE  to  BINARY  conversion  algorithm 

(program  gc2   2 

(def  1  ground) 

<def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

(def  reset  signal  Input  5) 

(def  Inp  port  Input  (S  7)) 

(def  bin  port  output  (8  9)) 

( def  10      power ) 

(process    grycod      0 

msbs 

(cond((*0    InpXsetq    bin    InpXgo    msbs)) 

((=    1     InpHsetq    bin     1np)(go    comp  1  )  )  ) 
comp  1 

(cond({=»0    1np)(setq    bin    (word-not    1np))(go    compl)) 

((■    1     1np)(setq    bin    (word-not    1np)>(go    nextblt))) 
nextb It 

(cond((-0    1np)(setq    bin     1np)(go    nextblt)) 

((-     1     1np)(setq    bin     1np)(go    compl)))        )     ) 


Figure    4.16    Gc2.mac 
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Figure    4.  17    Gc2.ci-f 
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C.   A  BLACKJACK  GAME 

The  previous  section  discussed  MacPitts  sequential 
logic  implementation  as  a  -function  of  algorithmic  syntax. 
A  simple  -finite  state  machine  was  developed,  and  the 
structural  ramifications  of  the  source  algorithm  were 
investigated.  This  section  will  discuss  development  of  a 
more  complex  algorithm,  and  its  consequent  structure. 

1 .    The  Al qor i  thm 

The  blackjack  game  algorithm  was  developed  based 
on  the  -following  rules.  The  rules  are  expressed  as  FSii 
states,  since  the  transition  to  MacPitts  syntax  is  easier 
that  way.  The  capitalized  words  correspond  to  node  names 
and  MacPitts  variables. 


sO: START,  initialize 

si: ACCEPT   card  (?)   (F,   go  sO) ,   add  FACE   value   to 

SCORE 
s2:if    ace  and    no   prior  ace       valued    as    11, 

SC0RE=SC0RE+10 
s3sif  SC0REO16,  HIT,  go  si 
s4:it  SCORE >21  and  previous  ace    valued  as  11, 

SC0RE=SC0RE-10,  go  s5 
s5:if   SCQRE<21   and  no  previous  ace       valued   as   11, 

BROKE,  go  si 
s6:it  17<=SC0RE=>21 ,  STAND,  go  si 


The  next  step  is  to  create  a  state  transition 
diagram,  and  then  to  translate  the  game  rules  into  the 
appropriate  MacPitts  entities  (ports,  registers,  signals, 
and  flags). This  is  usually  done  from  an  English 
description,  and  then  the  number  of  states  is  minimized  by 
standard   techniques.   Figure   4.18  shows   the   transition 


diagram,  which  is  not  minimized  -for  the  sake  of  clarity. 
There  are  seven  nodes  in  the  diagram.  The  top  node  is 
start,  the  initial  state  and  the  state  to  which  the  FSM 
reverts  when  the  reset  signal  is  brought  high.  The  next 
node  is  draw,  where  the  player  draws  a  card  (simulated  by 
an  off-chip  random  number  generator).  The  third  node  is 
labelled  ace,  and  represents  decisions  made  if  an  ace  is 
drawn.  The  next  node,  htchk,  checks  for  a  hit  condition 
(draw  another  card).  Following  htchk  is  devalu,  which 
decrements  the  rscore  contents  when  appropriate.  Then  the 
broke  (lose  game)  condition  is  tested  in  the  brkchk  (broke 
check)  state.  Finally,  the  stand  check  node,  stchk,  tests 
if  the  stand  (win)  condition  exists,  and  the  program 
returns  to  the  initial  state  for  either  replav  or 
termination.  The  state  transitions  follow  from  the? 
preceeding  rules.  The  MacPitts  driver  algorithm  is  written 
on  the  basis  of  the  state  transition  diagram.  The  driver 
is  shown  in  Figure  4.19. 

Storage    elements    are  required    for    state 

transition  decisions  under  the  CONDs ,  so  these  variables 
must  be  flags  (aceflg  and  acptflg).  Line  11  in  the  source 
code  reflects  this.  The  arithmetic  comparisons  are  made  on 
integer  values,  and  these  must  likewise  be  storage 
elements,  so  this  variable  is  defined  as  a  register 
(rscore,  line  10).  Since  the  FSM  progresses  asynchronously 
with   the   output  (no  new  output  with  each   clock   cycle) , 


I  4  s 


Figure  4.18  Blackjack  Game  State  Transitions 


150 


1  ;B5.MAC   BLACKJACK  MACHINE 

2  (program  blackjack   5 

3  (def  1  groundXdef  2  phfaHdef  3  phlbXdef  4  phtc) 

4  (def  face  port  1nput(  5  6  7  8  9)) 

5  (def  hit  signal  output  10)(def  stand  signal  output  11) 

6  (def  broke  signal  output  12) 

7  (def  score  port  output(13  14  15  16  17)) 

8  (def  accept_card  signal  Input  18) 

9  (def  reset  signal  Input  19) 

10  (def  20  power )(def  rscore  register) 

11  (def  aceflg  flagXdef  acptflg  flag) 

12  (always(setq  acptflg  accept_card ) ) 

1 3  ( process  play  0 

14  start 

15  ( condl acptf lg ( setq  rscore  0)(setq  aceflg  f))) 

16  draw 

17  ( cond ( acptf lg ( setq  rscore(+  rscore  face)) 

18  (setq  score  rscore)  (go  acenode)) 

19  (t  (go  start)  )  ) 

20  acenode 

21  (condUand  (-  face  1)  (not  aceflg)) 

22  (setq  rscore  (+  rscore  10)) 

23  (setq  score  rscore) 

24  (setq  aceflg  t ) ) ) 

25  htchk 

26  (  cond(  (  uns  Igned-O  rscore  16)(setq  hit  t)      (go  draw))) 

27  deva 1 u 

28  (cond((and  aceflg  (unsigned-)  rscore  21)  ) 

29  (setq  rscore  (-  rscore  10)) 

30  (setq  aceflg  f) 

31  (setq  score  rscore) 

32  (go  htchk) ) > 

33  brkchk 

34  (cond((and  (unsigned-)  rscore  21) (not  aceflg)) 

35  (setq  broke  t>  (go  start))) 

36  stchk 

37  (  cond  ((  and  (  uns  fgned-O  rscore  21) 

38  (unslgned->=  rscore  17)) 

39  (setq  stand  t)  (go  start))) 

)  ) 


Figure  4.19  BS.mac 
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there  must  also  be  a  port  (score,  line  7)  to  clock  the 
register  value  to.  Similarly,  a  port  (-face,  line  4)  is 
de-fined  as  the  input  (-face  value)  of  the  card.  Whenever  an 
output  is  produced  asynchronously  with  the  clock,  the 
latching  operation 

(setq  <register>  i nteger_val ue) 

must  be  made.  One  method  o-f  clocking  the  register  contents 
to  the  output  port  is  to  use  the  ALWAYS  statement  under 
the  PROGRAM  statement. 

(program  <name>  <data  path  width> 


(al ways (setq (output_port   regi  ster  _contents)  )  ) 
(process  <name>  < stack  depth > 


This  will  insure  accurate  current  output  values.  In  the 
blackjack  algorithm,  this  procedure  will  not  work.  It  the 
statement 

(alway5(setq  score  rscore)) 

is  used,  the  algorithm  would  appear  to  work  in  the  command 
interpreter.  Upon  compilation,  however,  the  following  LISP 
compiler  (Liszt)  diagnostic  results, 


Error:  Non-number  to  minus  nil 
<  1  > 


where   the   -first   line   o-f  the   diagnostic   indicates   an 


attempted  arithmetic  operation  on  an  empty  LISP  atom  or 
list,  and  the  second  line  is  the  LISP  debugger  prompt 
CRef.  lisp.  11-13. 

The  reason  why  this  does  not  work  (for  this 
algorithm)  is  that  rscore  has  not  been  initialized  (as  in 
Fortran,  -for  example)  at  execution  of  the  ALWAYS 
statement.  The  LISP  primitive  representing  rscore  is  at 
this  time  a  nil,  or  empty,  atom.  The  solution  is  to  clock 
the  register  (rscore)  to  the  output  port  (score)  whenever 
it  changes  value.  Lines  18,  21,  and  23  show  this  other 
method  o-f  register  transfer  to  ports. 

There  ar&  some  new  -forms  in  b5.mac  which  also 
require  discussion.  The  integer  test  which  returns  a 
Boolean  value  to  control  is 

(<signed>  < inequality  type>  integerl  integer 2) 

where  the  -field  <signed>  is  required,  and  is  either  blank 
or  the  string  "unsigned-"  -for  the  less  than,  less  than  or 
equal,  greater  than,  or  greater  than  or  equal  tests.  The 
comparison  is  made  with  the  < inequality  type>  between 
integerl  and  integer2. 

For  instance,  i -f  temp  is  an  integer  variable  set 
equal  to  72,  hot  is  an  integer  variable  set  to  88,  and  cold 
is  an  integer  variable  set  to  60,  the  following  forms  would 
produce   the   signals  to  control  shown.   The  result   of   the 


FORM  SIGNAL  TO  CONTROL 

(cond( (=hot  88) ) )  T 

(cond < (unsigned-<  hot  99)))  T 

(cond ( (unsi gned-<=hot  89)))  T 

(cond ( (=  temp  hot) ) )  F 

(cond( (unsigned-)  temp  hot)))  F 

(cond ( (unsigned-)3  70  temp) ) )  F 

integer   comparison  test  is  a  Boolean  value,   and  as   suchis 

used  as  a  conditional  under  COND,  as  shown  in  Figure  4.19. 

The  remaining  -forms  in  the  algorithm  have  been 
previously  explained.  The  algorithm  bS.mac  (which  required 
■five  tries  to  obtain  a  success-ful  compilation)  -follows  the 
FSM  state  transition  diagram  with  the  syntax  given.  The 
algorithm  has  been  exhaustively  tested  (only  possible  with 
simple  FSMs)  in  the  command  interpreter. 
2.    The  Chip 

Figure  4.20  shows  the  cifplot  resulting  from 
bS.mac.  The  appearance  is  similar  to  the  Gray  code  decoder 
layout,  with  the  exception  o-f  an  added  -functional  block  at 
the  top  right.  This  is  the  flag  block,  resulting  from  line 
11  in  b5.mac.  The  flag  block  is  both  a  source  and  a 
destination  for  control  signals,  as  the  driver  syntax 
suggests. 

The  data  path  is  organized  in  five  parallel 
units,  as  expected  from  line  2  in  b5.mac.  There  are 
seven  states  in  the  FSM,  so  only  three  of  the  five 
instantiated  sequencer  tails  are  connected  to  control  (the 
other  two  are  vestigal,  instantiated,  yet  not  used). 
Since   four   integer  values  were  used  in  the   comparisons, 
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Figure    4.20    B5.ci-f 


the  data  path  is  required  to  generate  the  comparison 
integers.  This  must  be  considered  in  designing  an 
algorithm,  in  assigning  the  data  word  length  under  the 
PROGRAM  statement.  The  maximum  score  possible  -for  the 
blackjack  game  is  27,  so  the  minimum  word  width  is  -five. 
Another  reason  -for  the  lengthened  data  path  is  the  number 
o-f  arithmetic  tests  made.  The  integer  values  -for  hit, 
stand,  broke,  and  devalu  are  made  within  the  data  path, 
since  syntax  speci-fies  structure  in  MacPitts.  In  the  Gray 
code  decoder,  the  comparison  tests  generate  combinational 
logic  in  the  data  path  which  sends  a  signal  to  control.  As 
more  data  path  tests  are  required,  a  longer  data  path  will 
resul t . 

The  Weinberger  array  o-f  the  blackjack  chip  shows 
a  multi— level  structure  similar  to  the  Weinberger  array 
■for  the  Gray  code  decoder.  As  the  Weinberger  array  grows 
in  complexity,  it  becomes  increasingly  di-f-ficult  to 
understand  its  -function  in  terms  o-f  a  gate  level 
equivalent.  The  correct  by  construction  property  of 
MacPitts  is  intended  to  assure  correct  operation  o-f  large 
control  path  circuits  nonetheless.  The  compilation  session 
recording  in  Appendix  B  shows  the  MacPitts  instantiation 
process  -for  the  blackjack  machine,  which  -follows  the  same 
general  scheme  as  -for  the  Gray  code  decoder. 


D.   MEAD-CONWAY  TRAFFIC  LIGHT  CONTROLLER 

The  -functional  description  o-f  the  Mead-Conway   tra-f-fic 
light   controller   is   taken   -from   CRe-f .  4:  p .  85] .   The   chip 
controls    a    tra-f-fic    light   at    a    hi  ghway--f  arm    road 
intersecti  on. 

1 .    The  Alqori  thm 

Design  o-f  the  algorithm  -follows  principles  stated 
previously.  A-fter  the  desired  -function  is  understood,  an 
automata  (state  diagram)  is  drawn.  From  this,  the 
algorithm  is  written.  The  placement  the  logic  is 
determined  by  syntax,  and  the  selection  o-f  storage 
entities  (-flags  or  registers)  -follows. 

The  light  controller  controls  the  three-light 
tra-f-fic  signals  at  the  intersection  o-f  a  busy  highway  and 
a  less  busy  -farmroad.  The  input  signals  are  C  (car  on  the 
-farmroad)  ,  TL  (long  timeout)  ,  and  TS  (short  timeout)  .  The 
outputs  are  ST  (start  timer),  FLO  and  FL1  (encode  the 
color  o-f  the  -farmroad  light)  ,  and  HLO  and  HL1  (encode  the 
highway  light  color).  An  FSM  is  appropriate  to  represent 
the  sequential  nature  o-f  the  tra-f-fic  light  cycling.  Figure 
4.21  shows  the  state  transition  diagram,  with  labels 
corresponding  to  the  MacPitts  states  in  the  algorithm. 

Next,  the  algorithm  is  written.  A  control  path 
architecture  is  chosen  for  ease  in  setting  the  output  bits 
(initially,  the  output  bits  are  set  individually).  Storage 
elements   (-flags)  are    not  needed  -for  this   example,   since 


Figure  4.21  Light  Controller  State  Transition  Diagram 
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the  outputs  are  synchronously  produced,  and  constant 
throughout  a  given  state.  In  control  path  circuits  using 
Boolean  variables,  the  value  goes  to  FALSE  at  the  next 
state  transition  unless  it  is  explicitly  set  to  TRUE.  So 
storage  o-f  the  output  values  would  be  required  i -f  they 
were  to  be  output  within  a  di-f-ferent  state  -from  that  in 
which  they  are  determined.  For  example,  i  -f  the  light 
control  signals  -for  the  highway  yellow  (HY)  state  were 
produced  in  the  previous  state  (HG) ,  then  they  would 
require  latching  so  the  correct  values  would  remain  a-fter 
the  state  transition.  I-f  the  chip  was  to  be  produced, 
however,  the  outputs  would  require  latching  as  explained 
in  the  previous  section,  since  the  chip  clock  is  many 
times  -faster  than  the  light  timer  clock. 

The  output  bits  which  control  the  -farmroad  and 
highway  light  colors  must  be  encoded.  The  following  table 
i  s  used 


HLO 

HL1 

FLO 

FL1 

o 

0 

0 

0 

GREEN 

0 

1 

o 

1 

YELLOW 

1 

1 

1 

1 

RED 

and   the  output  bits  are    explicitly  set  to  Boolean   values 
in  the  SETQ  -forms. 

Figure  4.22  is  the  MacPitts  algorithm  to  create 
the  tra-f-fic  light  controller.  The  -format  is  similar  to  the 
previous  FSM  drivers,  with  the  exception  o-f  absence  o-f 
data  path  combinational  logic.  The  data  path  width  must  be 
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;MEAD-CONWAY  LIGHT  CONTROLER 

;Set  the  D.P.  width  to  2  (4  nodes  1n  FSM  dgm> 

{program  lc2    2 

( def  1 3  power ) 

(def  1  ground) 

(def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

;The  following  3  SIGNALS  are  control  Inputs: 

(def  c  signal  Input  5) 

(def  tl  signal  Input  S) 

(def  ts  signal  Input  7) 

;The  RESET  signal  Is  required  for  all  FSMs: 
(def  reset  signal  Input  14) 

;Deflne  5  output  SIGNALS  (=>  C.  P.)  to 
;Control  the  TIMER  &  HW/FR  traffic  light: 
(def  st  signal  output  8) 
(def  h  1  J0r  signal  output  9> 
(def  hll  signal  output  10) 
(def  fl0  signal  output  11) 
(def  fl  signal  output  12) 
(def  fll  signal  output  12) 

;The  PROCESS  statement  Implies  FSM  sequencing, 
;The  stack  depth  is  zero: 

(process  1 1 ght_control 1 er   0 

;The  HIGHWAY  GREEN  state;  output=f(PS  &  PI) 

;where  <hg>=PS,  and  <C , TL ,TS>=P I : 

hg 

( cond ( ( not ( and  c  tl  )  ) 

(setq  hl0  f) 
(setq  hll  f  ) 
( setq  f 10  t > 
(setq  fll  f  ) 
(setq  st   f) 

(go  hg  )  ) 
(t  (setq  hl0  f ) 

(setq  hll  f  ) 
(setq  f 10  t > 
(setq  fll  f  ) 
(setq  st   t) 

(go  hy) )  > 

;The  HIGHWAY  YELLOW  state  and  associated 

;outputs  &  state  transitions  (<go  >) 

; (see  text  for  output  encoding  table  and 
[explanation  of  state  transition  syntax!  : 
hy 

( cond ( ( not  ts ) 

(setq  h!0  f ) 
(setq  hll  t) 
( setq  f 10  t ) 


Figure  4.22  Lc2.mac 
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<t 


(setq 

f  11 

f  ) 

(  setq 

st 

f  > 
(  go 

(  setq 

hl0 

f  ) 

(  setq 

hi  1 

t) 

(  setq 

f  10 

t  > 

(  setq 

f  1 1 

f  > 

(  setq 

st 

t) 
(  go 

hy)  ) 


fg)  )   ) 


;The  FARMROAD  GREEN  state  and  associated 
;outputs  &  state  transitions! 

(  cond  {  (  not  <  or  tKnot 


<t 


;The  FARMROAD  YELLOW  state; 

fy 

( condt ( not    ts ) 


(t 


c>  )  ) 
(  setq 
(  setq 
(  setq 
(  setq 
(  setq 

(  setq 
(setq 
(  setq 
(  setq 
(  setq 


(  setq 
(  setq 
(  setq 
(  setq 
(  setq 

(  setq 
(  setq 
(  setq 
{  setq 
<  setq 


hl0 
hi  1 
f  10 
f  1  1 
st 

hltf 
hi  1 
f  10 
f  1  1 
st 


hl0 
hi  1 
f  10 
f  11 
st 

hl0 
hi  1 
f  10 
f  1  1 
st 


t) 
f  ) 
f  ) 
f  ) 
f  ) 

( go 
t) 
f  ) 
f  > 
f  ) 
t) 

( go 


t) 

f  ) 
f  > 
t ) 
f  ) 

<  go 
t  > 
f  ) 
f  ) 
t  > 
t) 

(  go 


fg)  ) 


fy))        ) 


fy)  ) 


hg)  )        )     ) > ) ) 


Figure    4.22    Lc2.mac     (continued) 
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nevertheless  declared  with  the  PROGRAM  statement.  The  data 
path  width  is  two  ,  to  permit  instantiation  of  two 
sequencers  to  cycle  through  the  -four  states  o-f  the  FSM. 
The  initial  attempt  at  lc2  erroneously  used  a  data  path 
width  o-f  -five  ,  and  the  algorithm  compiled  to  cif.  The 
resulting  cifplot  had  a  data  path  width  of  -Five  bits, 
only  two  o-f  which  were  connected  to  the  sequencer  tails 
to  remember  and  address  the  states.  The  other  three  data 
path  units  took  up  chip  space,  but  performed  no  function. 
2.    The  Chip 

The  cifplot  resulting  from  lc2.mac  is  shown  in 
Figure  4.23,  and  the  script  of  the  compilation  session  is 
in  Appendix  B.  The  cifplot  resembles  the  previous  two 
FSM  cif plots,  but  lacks  flags  and  data  path  logic.  The 
only  registers  shown  a.r&  those  which  receive  and  store 
state  information  from  the  sequencer  tail.  As  usual,  they 
lie  in  the  data  path  above  the  clock  drivers.  Other  than 
that,  the  cifplot  for  lc2  has  no  data  path.  This  is 
expected  in  view  of  the  driver  algorithm,  and  the  script 
file  of  the  compilation  shows  only  six  data.  path 
organelles  but  43  columns  in  control.  A  handcrafted 
version  of  this  chip  could  be  produced  with  just  a  data 
path,  if  a  two  phase  clock  is  used.  This  will  be 
considered  in  the  next  chapter. 


1 


■k 


■:^-ava\w:v:va\w?:->:w/:^ 


Figure    4.23    Lc2.ci-f 
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E.   SUMMARY 

This  Chapter  has  considered  three  examples  of  MacF'itts 
sequential  logic:  the  Gray  code  decoder,  the  blackjack 
game,  and  the  Mead-Conway  light  controller.  In  each  case, 
the  Mealy  FSM  convention  of  MacF'itts  led  to  an  easy 
transition  -from  state  diagram  to  algorithmic  description. 
The  Mealy  architecture  is  evident  in  both  the  MacPitts 
algorithm  and  the  resulting  chip  layout. 

In  the  algorithm,  each  state  is  given  a  name  (e.  g. , 
HIGHWAY  GREEN,  HIGHWAY  YELLOW)  and  within  each  state  the 
outputs  are  determined  with  the  COND  -form  and  set- 
accordingly.  The  output  is  a  function  of  both  present  state 
and  present  input  (e.  g.,  CARS,  T IMEUUT_LGNG , 
TIMEOUT_SHORT) . 

The  same  Mealy  logic  is  evident  in  the  circuit  layout 
(cifplot).  The  sequencer  stores  the  present  state,  and 
multiplexers  driven  by  the  Weinberger  array  and  present 
inputs  determine  the  next-state  transitioning  by 
controlling  the  inputs  to  the  bank  of  present-state 
r  eg  1  st  er  s . 

Sequential  logic  in  MacPitts  can  be  influenced  by  the 
designer  in  the  same  way  as  combinational  logic  can,  by 
explicitly  specifying  the  desired  outputs.  The  alternative 
is  to  specify  the  outputs  as  an  implicit  function  of 
either  inputs  (ports,  input  signals)  or  intermediate 
results  (internal  signals,   flags,  registers).  In  general. 


■  , 


when  the  explicit  specification  of  outputs  is  used 

(setq  score  19) 
rather  than  the  -functional  specification  o-f  outputs 

(setq  score  (+  rscore  -face)  ) 

a  smaller  and  -faster  circuit  will  result.  The  explicit 
specification  of  outputs  is  therefore  the  preferred 
method,  though  not  always  possible.  If  there  ar&  many 
possible  outputs,  it  may  even  be  better  to  use  the 
functional  specification  of  outputs  rather  than  attempting 
to  specify  each  one  explicitly. 

The  data  path  width  for  a  MacPitts  sequential  machine, 
as  specified  in  the  PROGRAM  statement,  must  be  la.r<"i<? 
enough  to  address  the  number  of  states.  That  is,  the  data 
path  width  must  be  greater  than  or  equal  to  log  (base  2) 
of  the  number  of  states  in  the  state  transition  diagram. 
If  this  c  on  d  i  1 1  on  is  n  ot  met ,  \v\s.c  Pitt  s  ( t  h  e  si  1  i  c  on 
compiler)  will  not  successfully  compile  the  source 
algorithm.  The  reason  for  this  requirement  is  the  manner 
in  which  MacPitts  lays  out  the  sequencer  and  data  path, 
The  sequencer  and  data  path  s.rG  laid  out  contiguously,  in 
a  linear  bit-slice  configuration.  The  width  of  both  is  the 
width  of  the  data  path  as  specified  in  the  PROGRAM 
statement  (this  number  is  also  the  number  of  present— state 
registers  instantiated).  Since  there  must  be  the?  same 
number  of  i /o   ports  as  the  data  path  width,  and  since  all 


of  these  ports  may  not  be  used  -for  data  i/o,   one  solution 
to   the   problem  o-f  extra  ports  is  to  ground  them   in   the 
circuit   in  which  the  chip  is  to  be  used  (as  suggested  for 
the  Gray  code  decoder,   where  only  one  port  was  necessary, 
but   two  ports  had  to  be  speci-fied  to  allow   enough   state 
transitions).   The   alternate  solution  -for  the  Gray  code   to 
binary   conversion  routine  is  to  treat  the  data  as  a   serial 
stream,   one  bit  wide.   This  suggests  using  SIGNALS  (instead 
o-f  PORTs)  as  inputs,  and  processing  the  Gray  code  as  Boolean 
data  instead  of  integer  data.  This  algorithm  is  included  for 
completeness   in  Append ix  B,   with  the  resulting  cifplot  and 
script  of  the  compilation  process. 

MacPitts  provides  a  convenient  method  to  compare  both 
Boolean  and  integer  values,  which  is  particularly  useful 
in  the  decision-making  under  a  CGND.  The  Boolean 
comparisons  (Figure  4.22)  are  used  to  test  the  value  of  a 
flag  or  a  signal,  and  the  integer  comparisons  (Figure 
4.19)  a.re  used  to  compare  numerical  values  in  ports  or 
registers.  In  each  case,  the  result  is  a  Boolean  signal  to 
control  which  affects  subsequent  state  transitioning  or 
setting  of  outputs. 

Algorithm  design  for  MacPitts  FSiis  begins  with  the 
decision  of  how  much  data  it  is  desired  to  process 
simultaneously,  and  in  what  form  that  data  presents  itself 
to  the  chip.  For  instance,  if  a  serial  FSM  chip  is  desired 
(e.   g.  ,  a  serial  Gray  code  decoder)  ,  the  data  word  is  one? 


bit  wide.  The  inclination  is  there-fore  to  treat  the  data  as 
Boolean  type,  which  is  -feasible  -for  FSM  architectures  tor 
reasons  explained  previously.  The  designer  is  not 
constrained  to  integer  data  types  in  this  case  (although  the 
examples  presented  in  Figure  4.2  and  Figure  4.16  used 
integer  data  types).  I-f  the  data  comes  to  the  chip  tor 
parallel  processing  in  an  n-bit  word,  however,  the 
inclination  is  to  treat  the  data  as  integer  type  (tor 
example,  the  blackjack  algorithm).  This  is  not  always 
possible,  -for  reasons  to  be  explained  in  connection  with 
Hamming  error  correction  in  Chapter  VI  (MacPitts  does  not 
permit  implicit  setting  ot  bits  within  a  data  word). 

Algorithm  design  may  be  viewed  as  the  designer's 
influencing  of  the  chip  layout.  Since  circuit  structure  is  a 
function  of  syntax  (on  a  lower  level)  ,  it  is  reasonable?  to 
assume  that  chip  layout  is  a  function  of  algorithm  structure 
(on  a  higher  level).  That  is,  syntax  determines  not  only  the 
individual  circuit  elements  (NANDs,  QRs ,  XQRs,  ports,  flags, 
registers,  etc.)  of  the  chip,  but.  also  determines  how  the 
individual  elements  work  in  concert.  The  source  algorithm 
lc2.mac  shown  in  Figure  4.22  used  Boolean  control  signals  as 
inputs  (C,  TL ,  TS).  The  resulting  cifplot  in  Figure  4.23 
shows  a  Weinberger  a.rray  at  the  bottom,  and  no  data  path 
except  for  a  bank  of  two  sequencer  organelles  at  the  top  o-i- 
the  chip.  This  chip  can  be  viewed  as  a  control  path  chip.  An 
alternate  design  would  use  a  five-bit  word  (representing  the 


signals  HLO,  HL1 ,  FLO,  FL1 ,  and  ST)  as  the  output,  and 
retain  the  three  control  signals  as  inputs.  Appendix  B  shows 
dplc2.mac  (the  data  path  equivalent  o-f  Figure  4.2:1, 
lc2.mac),  and  the  resulting  ci-fplot.  The  output  bits  -are  set 
explicitly  by  setting  the  output  word  values  in  the  .mac 
■file.  This  results  in  a  larger  data  path,  as  expected,  since 
the  output  decisions  result  in  data  path  operations  instead 
o-f  control  path  operations.  The  control  path  is  smaller  than 
in  lc2.ci-f,  since  the  Weinberger  a.rra.y  has  fewer  decisions 
to  make.  Appendix  B  also  contains  the  script  file  of  the 
compilation  o-f  dplc2.mac. 

Yet  another  version  o-f  the  light  controller  would  assign 
the  input  values  to  a  three  bit  word  (representing  C ,  TS , 
and  TL) ,  and  make  the  conditional  checks  on  the  input 
control  word  with  the  BIT  statement.  This  solution  would 
result  in  a  still  larger  data  path  and  a  smaller  control 
path  than  the  two  previous  light  controller  chips.  Just  as 
in  any  high-level  language,  there  exists  many  ways  o-f 
solving  a  given  problem  with  MacPitts.  The  best  way  to  solve 
the  problem  must  consider  not  only  the  algorithm,  but.  the 
structural  (layout)  consequences  of  algorithmic  syntax.  The 
"best"  solution  is  arrived  at  by  experience  in  MacPitts 
programming,  knowledge  of  the  consequences  o-f  syntax,  and 
•finally,   iteration   toward   a   better  solution   (trial   and 


error  > 


V.    A  COMPARISON  OF  A  MACPITTS  DESIGN 
WITH  A  HANDCRAFTED  EQUIVALENT 

Previous  chapters  illustrated  some  inefficiencies 
inherent  in  the  MacPitts  layout  scheme.  The  Weinberger  array 
and  the  data  path  both  use  transverse  polysilicon  wires  -for 
cross-communication,  and  poly  has  the  highest  specific 
resistance  of  the  three  possible  NMOS  wire  materials.  The 
one  dimensional  river  routing  method  used  is  not  optimal, 
because  the  input,  output,  and  data/ control  lines  required 
are  long.    The   sequencer   organelles  are:  instantiated 

according  to  the  data  path  width,  and  not  according  to  the 
number  of  states  necessary.  The  Weinberger  array  generates 
multiple  cascaded  gates  to  implement  multiple  output, 
combinational  logic  functions,  causing  long  signal  delays  in 
comparison  to  a  PLA.  A  handcrafted  version  of  a  functionally 
equivalent  chip  is  compared  to  a  MacPitts  design  to 
investigate  these  differences  both  quantitatively  and 
qual i  tat i  vei  y . 

A.   THE  HANDCRAFTED  TRAFFIC  LIGHT  CONTROLLER 

The  standard  for  this  comparison  is  a  handcrafted  (CAD) 
version  of  the  Mead-Conway  traffic  light  controller  which  is 
compared  to  the  MacPitts  generated  version  in  terms  of  speed 
and  power  consumption.  Qualitative  observations  ar<e  also 
descr i  bed . 


The  custom-made  traffic  light  controller  was  constructed 
on  the  Caesar  VLSI  graphics  editor  with  the  aid  of  various 
VLSI  CAD  tools. 

1 .    Desi  qn 

The  MacPi tts-produced  traffic  light  controller  was 
described  in  the  Chapter  IV.  MacPitts  design  is  just  a 
matter  of  generating  a  prototype  MacPitts  driver  program, 
and  refining  it  until  an  acceptable  archetype  algorithm  is 
achieved.  This  is  done  in  both  the  command  interpreter 
(algorithmic  optimization) ,  and  in  Caesar  (structural 
optimization).  Caesar  allows  the  designer  to  see  the 
structure  and  analyze  it  with  power  estimators  (Powest)  and 
timing  estimators  (Crystal,  SPICE).  Moving  pads  and  deleting 
vest  i  gal  structures  3^re  examples  of  possible  structural 
optimizations  using  Caesar  (this  procedure  should  be 
considered  if  the  MacPitts  chip  is  to  be  fabricated). 

The  standard  VLSI  design  scheme  is  similar  to 
MacPitts  design  in  that  structure  is  considered  as  a 
function  of  behavior.  The  behavior  is  not  constrained  to 
follow  a  given  algorithmic  syntax,  though,  as  it  is  in 
MacPitts.  So  custom  design  is  more  flexible  than  silicon 
compiler  designs  s.re,  since  the  designer  can  choose  any 
desired  structure  to  implement  the  behavior  called  for. 

The  standard  MMDS  PLA  is  used  for  the   hand-crafted 
traffic  light,  controller.  Mead  and  Conway  LRef.  4:pp. 80-8811 
develop  the  state  transition  table  for  the  light  controller. 


and  provide  a  sticks  diagram  of  the  clocked  PLA  FSM.  The 
■following  PLA  is  based  on  the  Mead-Conway  development. 

Ousterhaut  CRef.  9]  illustrates  use  of  Eqntott  and 
Re-ference  10  illustrates  use  o-f  Tpla  to  generate  this  PLA. 
Eqntott  is  a  VLSI  CAD  program  which  takes  logic  equations  as 
the  input  and  produces  a  PLA  truth  table  as  the  output.  This 
truth  table  is  the  input  to  Tpla  (Technology  independent 
Programmed  Logic  Array),  and  Tpla  -further  allows  the 
designer  to  geometrically  modify  the  PLA.  The  result  o-f  Tpla 
processing  the  truth  table  is  a  Caesar  representation  of  the 
desired  PLA.  Figure  5.1  shows  the  input  logic  equations  for 
Eqntott,  and  Figure  5.2  shows  the  resulting  truth  table  from 
Eqntott . 

The  best  method  to  design  a  PLA  is  to  create  the 
logic  equations  as  in  Figure  5.1,  and  then  use  the  Unix 
pipeline  to  send  the  result  of  Eqntott  to  Tpla 


eqntott    L opt ions J    infilename    I    tpla    [options  J 
outf  i 1 ename 


The  result  is  a  Caesar  file  of  the  PLA  layout,  which  must  be 
converted  to  cif  in  Caesar  as  previously  described.  Figure 
5.3  shows  the  -trans  PLA  (inputs  and  outputs  on  opposite 
sides)  generated  from  the  command 


eqntott   -1   -f   -R   stopltltpla  -s   Btrans   -I   -Q   -o 
stopl t . ca 


which   took   28  seconds  to  complete.    The  eqntott  switch   -1 
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Figure  5.1  Stoplight  Logic  Equations  -for  Eqntott 
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Figure  5.2  Truth  Table  Input  -for  Tpla 
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means  list  the  truth  table,  — f  means  to  connect  the  feedback 
paths  in  the  PLA,  and  -R  directs  eqntott  to  minimise  the 
truth  table.  The  tpla  switch  -s  selects  the  PLA  type  (-trans 
shown) ,  and  -I  and  -0  indicate  clocked  inputs  and  outputs™ 
This  command  string  creates  an  NMOS  FSM  Caesar  -file.  It  was 
determined  later  that  a  -cis  PLA  (input  and  output  on  the 
same  side  o-f  the  PLA)  would  -fit  the  chip  -frame  better.  The 
change  is  simple.  The  same  command  string  as  above  was 
issued,  except  Bcis  was  substituted  -for  Btrans. 

The  PLA  is  a  -fast  structure.  Appendix  A  shows  the 
interactive  Crystal  session  showing  the  timing  analysis  of 
just  the  PLA.  The  delays  a.r&  determined  to  be  26.93  ns  tor 
phi  a  and  32.06  ns  -for  phib.  For  symmetric  phi  a  and  phib 
durations,  with  each  having  the  duration  o-f  the  slowest 
critical  path,  or  32.06  ns,  the  maximum  clock  rate  is  15.6 
ns.  The  maximum  clock  rate  is  calculated  as  the  inverse?  o+ 
twice  the  slowest  critical  path  time.  The  use  of  Crystal  on 
non-overlappinq,  two— phase  clocking  schemes  is  described  in 
[Re-f.3:pp.  80-93  J. 

The  sequential  logic  -for  the  light  controller  chip 
is  made  with  the  University  of  Washington /North west 
Consortium  CAD  tools  as  described  above.  All  that  is  lacking 
is  the  power  and  ground  connections  and  the  pads.  Usual  I  -• 
the  power  and  ground  busses  arE?  laid  out  by  hand  (Caesar)  or 
specified  in  cartesian  coordinates  (CLL,  Chip  Layout 
Language,   a   method   o-f   specifying   mask.   polygons,   their 


s^^^MI?^^;^^^^^:^"^  ~1 


Figure  5.3  -Trans  PLA  Resulting  -from  Eqntott  and  Tpla 
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dimensions,  and  the  -fabrication  process  required),  and  the 
pads  are  then  invoked  -from  an  existing  library  o-f  VLSI  macro 
cells.  MacPitts  can  shorten  the  design  time  by  doing  most  o-f 
this  work  for  the  designer.  Figure  5.4  shows  the  algorithm 
stopl  t_-f  rm.  mac  used  to  create  the  frame  -for  the  PLA  FSM.  The 
frame  is  created  like  wire. mac  (Figure  2.1),  in  that  it  is 
just  wires  from  input  to  output.  The  wires  are  deleted  in 
Caesar,  and  the  PLA  is  placed  in  the  center  of  the  chip 
frame.  Figure  5.5  shows  the  resulting  chip.  The  clocked 
-cis  PLA  is  in  the  center  o-f  the  chip,  connected  to 
appropriate  inputs  and  outputs  (tpla  makes  this  connection 
easy,  it  labels  all  inputs  and  outputs).  The  third  clock  pad 
(phic)  is  deleted  in  Caesar.  This  chip  still  has  long 
indirect  metal  runs  and  lots  of  white  space. 
2.    Opt i  mi  zat i  on  and  Anal ysi  s 

Figure  5.6  shows  a  condensed  version  of  the  chip, 
stopl  t__mi  nc  .  ci  f  .  The  area  o-f  the  chip  shown  in  Figure  5.6  is 
40"/.  smaller  than  the  chip  in  Figure  5.5,  and  still  more 
reduction  is  possible.  Since  there  are  12  pads,  it  would  be 
better  to  place  three  per  edge  on  the  chip.  The  signal  wires 
could  also  be  shortened  by  judicious  choice  of  pad  placement 
in  the  .mac  algorithm.  And  finally,  all  sides  could  be 
brought  closer  together.  There  exists  a  synergistic 
relationship  between  the  existing  CAD  tools  and  MacPitts 
that  bears  further  study. 


; stop  1 t_f rm.mac 

;Th1s  pgm  creates  a  design  frame  for  the  stoplight 

;controller  Ccf.Mead  &  Conway,  p. 81,  2nd  printing] 

;  hand-craft  1  ng  will  be  required  to  merge  the  PLA 

;FSM  created  by  eqntott I tp 1  a  Into  this  frame.  CAESAR 

; Is  used  to  do  this. 

{program  stop  1 t_frm. mac  5 

( def  1 3  power ) 

(def  1  ground) 

(def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

; Inputs  to  light  controller  PLA  FSM 

(def   c  signal  Input  5) 

(def   tl  signal  Input  6) 

(def   ts  signal  Input  7) 

;outputs  from  light  controller  PLA  FSM 

(def   st  signal  output  8> 

(def   hl0  signal  output  9) 

(def   hll  signal  output  10) 

(def   fl0  signal  output  11) 

(def   fll  signal  output  12) 

( a  1  ways 

* 

jhere  we  setq  5  simple  dummy  paths.  These  are  chosen  with  a 

jvlew  towards  later  simple  editing  In  CAESAR 

t 

(setq  St  c) 

( setq  hl0  tl ) 

(setq  hll  ts) 

( setq  f 10  c) 

(setq  f 1 1  tl >   )      ) 


Figure  5.4  Stopl t_f rm. mac 
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Figure  5.5  Stopl  t_chp.  ci -f 
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Figure    5.6    Stopl t_mi nc. ci f 
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Intervention  by  the  designer,  however,  is  antithetical  to 
the  goal  of  silicon  compilation.  The  silicon  compiler  has  a 
ruleset  which  (in  theory)  guarantees  the  property  of 
"correct  by  construction".  This  property  states  that  the 
chip  design  will  always  be  -functionally  correct;  it  cannot 
be  wrong.  Circuit  density  is  not  the  primary  goal,  nor  is 
speed . 

The  MacPitts  designer  has  no  control  over  circuit 
density,  other  than  Boolean  optimization  of  the  algorithmic 
•forms  as  explained  in  Chapters  II  and  III.  The  designer  does 
have  some  control  over  chip  speed.  There  are  two  ways  of 
optimizing  throughput  in  a  MacPitts  design.  The  first  method 
is  explained  at  the  beginning  of  Chapter  III,  and  can  be 
thought  of  as  algorithmic  optimization.  The  objective  is  to 
write  an  algorithm  which  executes  in  a  minimum  number  of 
clock  cycles.  The  verification  is  done  in  the  command 
interpreter.  PAR,  COND ,  and  PROCESS  are  used  wherever 
possible  to  parallel  operations. 

The  second  method  of  controlling  chip  speed  is 
through  circuit  optimization  (this  too  is  a  function  of 
syntax  in  MacPitts).  The  designer  chooses  either  the  data 
path  or  the  control  path  or  a  hybrid  of  both,  and  with 
Crystal  designs  a  chip  which  has  a  maximum  speed  per  clock 
cycle.  The  throughput  is  then  the  product  of  the  inverse  of 
the  number  of  clock  cycles  required  for  a  valid  result  and 
the  cycle  rate  (results/cycle  x  Hz  =  results/sec). 


Furthermore,  the  circuit  speed  can  be  increased  by 
judicious  placement  of  pads  in  the  .mac  tile.  It  is  not 
always  apparent  where  the  routing  will  go  beforehand,  so  the 
recommended  method  is  to  create  a  prototype  citplot,  and 
then  modi-fy  the  pad  numbering  in  the  .mac  tile  to  decrease 
signal  path  lengths  trom  the  pads  to  the  logic  elements.  For 
example,  in  stopl t_mi nc . ci t  (Figure  5.6),  the  phia  pad  would 
be  moved  to  center  lett  on  the  chip  trame,  phib  to  center 
right,  ground  to  top  right,  and  C,  TL ,  and  TS  would  be  moved 
to  the  lower  lett  corner  region  to  decrease  metal  run 
lengths.  All  ot  these  suggested  moves  a.r<B  not  possible  due 
to  the  way  MacPitts  places  pads,  so  Caesar  editing  is 
required  to  optimize  the  MacPitts  design  it  minimal  length 
runs  are    desired. 

Appendix  C  contains  the  Crystal  analysis  ot  the  PLA 
trattic  light  controller.  The  chip  speed  is  limited  to  the 
inverse  ot  the  sum  oi  the  critical  propagation  times,  or 
6.85  MHz.  This  is  less  than  halt  the  speed  ot  just  the  PLA 
(16.95  MHz).  Appendix  C  also  contains  the  Powest  analysis  of 
the  PLA  tra.-f-fic    light  controller. 

B„   COMPARISON  WITH  MACPITTS  DESIGN 

Appendix   C   contains  the  Crystal  command  tile   tor   the 
MacPitts   trat-fic  light  controller  timing   analysis.   Froede 
CRet.   3:  pp.   80-8511   explains  the  analysis  ot  a   MacPitts 
design  with  Crystal.   The  Crystal  command  tile  in  Appendix  C 


shows  just  the  commands  issued  to  Crystal,  and  in 
parentheses  to  the  right,  the  time  delay  values  returned 
(representing  an  actual  Crystal  session). 

Figure  4.23  shows  the  chip  on  which  this  Crystal 
analysis  was  done.  The  critical  path  is  -from  phic  to  the 
clock  drivers  to  the  state  registers.  The  clock  drivers 
induce  a  cumulative  delay  o-f  23.9  ns,  and  the  state 
registers  a  cumulative  delay  o-f  114.2  ns,  so  the  transition 
induces  a  delay  o+  90.3  ns.  The  Weinberger  array  induces 
another  178  ns,  and  the  slowest  path  is  -from  there  to  the  ST 
pad.  The  total  delay  is  363.52  ns,  for  a  maximum  speed  of 
2.75  Mhz .  This  speed  is  40"/.  of  the  maximum  speed  of  the  PL  A 
light  controller. 

Figures  5.7  and  5.8  show  the  floorplans  of  each  version 
of  the  traffic  light  controller.  Figure  5.7,  the  PLA  FSM  is 
comparatively  simple.  The  FSM  is  a  small  clocked  PLA  with 
feedback.  The  connections  to  the  pads  are  all  metal  (not 
shown).  Figure  5.8  is  the  MacPitts  version,  and  is  far  more 
complicated.  The  control  path  is  large,  and  induces  the 
largest  part  of  the  delay.  The  present  state  (PS  in  Figure 
5.8)  -next  state  mechanism  is  much  more  complex  than  the 
simple  PLA  feedback  generated  by  eqntott  and  tpla.  The  wires 
between  the  data  and  control  paths  are  poly,  as  are  the  PS 
feedback  lines  in  Figure  5.8.  These  wires  contribute  to  the 
slowness  of  the  MacPitts  chip.  The  wires  to  the  pads  also 
take  a  more  circuitous  route,  inducing  still  more  delay. 

1  E 


Figure  5.7  PLA  Stoplight  Chip  Floorplan 
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Figure    5.8    MacPitts    Stoplight    Chip    Floorplan 
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Table  5.1  compares  the  MacPitts  tra-ftic  light  controller  and 
the  PLA  tra-f-fic  light  controller. 


TABLE  5. 1 


MACPITTS  vs.  HANDCRAFTING 


del  ay 
CnsD 


PLA    Chip 


146.98 


max  .  cl  ock  freq . 

CMhzD  6.85 

pull  up  transistors  35 

avq.  DC  power C WD  .042 

man.  DC  power C WD  .085 


control  path  dimensions 

[mm] 


. 49   x  . 29 


data  path  dimensions 

[mm]  . 178 

a.r&a.    ratio  Ccp/dpD         .046 

chi p  size 

Cmm**2D  .836 
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MacPitts  Chi  p 

501 . 97 

1  .  99 
87 

.  055 
.  107 

.547  >i     .  185 

.  256  ;•:  .  240 
8.9 

1  .  1  64 


VI.  DESIGN  EXAMPLE: HAMMING  ERROR  DETECTOR/CORRECTOR 

This  Chapter  describes  one  method  o-f  design  with 
MacPitts.  The  procedure  is  to  first  de-fine  the  problem,  then 
to  write  an  initial  algorithmic  description  o-f  the  solution 
in  MacPitts  (the  language).  The  initial  algorithm  is  either 
a  simplified  version,  or  a  piece  of  the  larger  problem.  The 
simplified  algorithm  is  tested  for  execution  in  the 
interpreter,  and  then  compiled  to  cif.  Alternate  solutions 
are  considered  next,  and  simplified  alternate  solutions  are 
likewise  tested.  The  best  of  these  algorithms  is  then 
chosen,  based  on  speed,  power  dissipation,  and  size.  The 
chosen  solution  is  then  expanded  to  solve  the  larger 
probl em. 

The  problem  is  to  design  a  parallel  Hamming  method  error 
detector /corrector  which  will  correct  single  bit  errors  in  a 
15-bit  encoded  message. 

A.   THE  ERROR  DETECTOR 

The  theory  behind  Hamming  error  detection  and  correction 
is  found  in  most  texts  on  coding  and  information  theory 
CRef.  5:pp.  39-49D.  A  subset  of  this  problem  is  error 
detection,  which  the  prototype  algorithm  solves. 

The  prototype  algorithm  looks  at  a  three  bit  encoded 
message   in  parallel,   and  by  the  Hamming  method   determines 


the  bit  error  location.  The  algorithm  is  written  to 
demonstrate  correct  operation  -for  three-bit  messages.  It  can 
later  be  expanded  to  cover  longer  word  lengths. 

The  Hamming  method  scans  the  encoded  word,  and  by  a 
series  of  parity  checks  determines  the  bit  error  position. 
The  single  error  detection  method  assigns  the  result  o-f  each 
parity  check  to  a  bit  o-f  data.  The  word  -formed  from  the 
resulting  bits  comprises  the  syndrome.  The  value  o-f  the 
syndrome  is  the  bit  error  position  in  the  received  message. 

The  parity  checking  is  done  in  a  specific  order.  If  the 
codeword  is  a  string  of  n  bits  with  the  1 sb  leading 

0  1  2  3  4  5  6  7  8  ...  n 

then  the  syndrome  bits  are  determined  by  parity  checks 
across  the  message  bits  as  shown  below. 

syndrome  bit    message  bit  positions  for  parity  check 

0  0  2  4  6  8  10  12  14  16  18  20  ... 

1  1  2  5  6  9  10  13  14  17  18  21  22  .. . 

2  3    4  561112131419    20    21    22    ... 

3  7  8  9  10  11  12  13  14  23  24  25  26  27. . . 


Where  the  syndrome  word  is   read  from  msb  to  1 sb  and   points 
to  the  message  bit  which  needs  correcting. 

For  instance,  for  an  encoded  seven  bit  message,  there 
ar&  three  check  bits  (represented  by  "c") ,  and  four  bits  of 
information  (represented  by  "i")  in  the  positions  indicated 


bel  ow 


c  c  1  c  1  1  1 
The  -First  bit  of  the  syndrome  ( 1  sb )  is  determined  by  parity 
checks  over  positions  0,  2,  4,  and  6.  The  next  bit  o-f  the 
syndrome  considers  positions  1,  2,  5,  and  6.  The  last  bit  of 
the  syndrome  (msb)  is  determined  -from  message  positions  3, 
4,  5,  and  6.  The  three— bit  syndrome  indicates  the  error 
position  in  the  message  string.  I-f  the  received  message  is 
0100011,  the  syndrome  generated  is  011.  The  syndrome 
indicates  an  error  in  the  third  bit  -from  the  right.  The 
correct  message  is  0110011.  The  Hamming  method  corrects 
(complements)  the  third  symbol. 
1 .    Desi  qn  Con si  derati  ons 

Previously  in  this  research  it  was  noted  that 
MacPitts  syntax  does  not  permit  explicit  bit  manipulation  in 
the  data  path.  To  do  this  algorithm  in  the  data  path  may  be 
desirable,  in  view  o-f  the  speed  of  simple  data  path 
•functions.  Since  this  is  not  possible,  perhaps  a  hybrid  data 
path-control  path  algorithm  should  be  considered.  A  review 
o-f  the  Gray  code  decoder  chip  (Figure  4.2)  will  show  why 
this  is  not  a  good  approach.  The  Gray  code  decoder  is  a 
mixed  structure,  having  both  a  data  path  and  a  control  path. 
The  interconnections  are  all  poly,  which  slows  the  chip 
down.  The  multiple  unPARalelled  CONDs  have  a  more 
detrimental   effect  on  speed,   since  each  requires   a   clock 
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cycle  to  execute  i  f  its  antecedant  is  true.  So  the  target 
architecture  will  be  Boolean  (control  path). 

The  parity  checks  can  be  done  by  a  variety  o-f 
methods  in  MacPitts.  The  simplest  way  is  with  the  built  in 
library  -function  PARITY,  which  has  the  -Format 

parity  (boolean  boolean  ...) 

PARITY  per-forms  modulo  two  addition,  and  returns  Boolean 
TRUE  to  control  i  f  the  argument  i  s  an  odd  number  o-f  TRUEs, 
or  Boolean  FALSE  if  the  argument  i s  an  even  number  of  TRUEs. 
So  the  parity  checks  can  be  done  directly  on  the  bits  of  the 
message,  in  parallel,  with  the  PARITY  statement. 

MacPitts  also  has  a  method  of  checking  specific 
bits  in  a  data  word.  The  BIT  statement  looks  at  a  bit  in  the 
integer-valued  word,  and  returns  a  TRUE  to  control  if  the 
bit  is  one,  or  a  FALSE  to  control  if  the  bit  is  zero.  The 
form  of  the  BIT  statement  is 

(bit   <bi  t_posi  ti  on)  <  i  nteger_e>;pressi  on  >  ) 

Figure  6.1  is  the  algorithm  tst.mac,  used  to  test  the  BIT 
statement.  It  is  similar  functionally  to  wire. mac,  in  that 
it  sets  an  output  bit  to  an  input  bit.  The  difference  is 
that  BIT  permits  a  bit-by-bit  conversion  from  integer  value 
to  Boolean  value.  In  Figure  6.1,  the  input  word  mesg  is 
integer  valued.   The  output  bits  ^re    Boolean  signals  (out;;), 


and  they  are    setq'd  to  the  respective  bit  position  values  of 
mesg  (the  corrupted  input  word). 
2-    Prototype  Error  Detector 

Knowing   Hamming   error  detection   theory   and   the 
PARITY  and  BIT  statement  syntax,  an  error  detector  algorithm 


;TST.MAC 

;A  MacPltts  algorithm  for  bit-setting  of  output  ports 
;The  BIT  form  Is  used  to  select  a  specific  bit  of  the 
;  Input  data  wo rd,  and  an  output  signal  Is  set  to 
;The  value  of  the  bit  selected. 

;Requ1re  a  D.P.  width  of  three  to  accommodate  the  Input 
(program  tst     3 

(def  1  ground) 

{def  2  phla) 

(def  3  phlb) 

(def  4  phlc) 

;Use  a  3-btt  INTEGER  as  Input  PORT: 
(def  mesg  port  Input  (5  6  7  )> 

;Use  3  BOOLEAN  SIGNALS  as  outputs: 

(def  out0  signal  output  8) 

(def  outl  signal  output  9) 

(def  o  u  1 2  signal  output  10) 

( def  1 1  power  > 

;Perform  bit-setting  on  each  clock  cycle: 
( a  1  ways 

;Select  which  bit  of  the  Input  word  Is  to 

;Be  SETQ'd  to  the  output  signal  pads: 

(setq  out0  (bit  0  mesg)) 

(setq  outl  (bit  1  mesg  )  > 

(setq  out2  (bit  2  mesg))   )   ) 


Figure  6.1  Tst . mac 


can  be  written.  The  encoded  message  input  (mesg)  is  word- 
valued,  three  bits  wide.  The  output  syndrome  (syndx)  is  two 
Boolean  signals.  The  algorithm  is  shown  in  Figure  6.2.  The 
semantics   o-f   the   MacPitts  algorithm   -follow   the   English 
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description  o-f  the  problem  statement.  The  appropriate  bit 
patterns  o-f  the  message  are  checked,  and  the  syndrome  bits 
are  set  based  on  the  results  of  the  parity  checks.  This 
algorithm  was  exhaustively  tested  in  the  command 
interpreter,   and   serves   as  the  prototype   for   the   error 


; HAM3 . MAC 

;A    MacPltts    algorithm    for    single-error    detection 

juslng    the    Hamming    method. 

(program    haml     3       ;note    width    of    data    path    (=w1dth    of    msg ) 

( def     1    ground) 

(def    2    phla) 

(def    3    phlb) 

(def    4    phlc) 

;mesg    Is    the    Input    data    word    of    3    bits   width   with    possible    errors 

(def    mesg    port    Input    (5    6    7    )> 

(def    syndl    signal    output    8) 

(def    synd2    signal    output    9) 

( def    10    power ) 

( a  1  ways 

;For    a    3    bit   word,    two   parity    checks    are    required.    The 
;result    of    these    parity    checks    Is    a    2    bit    syndrome,    which 
{Indicates    the    bit    position    of    the    error     In    the    3    bit    word. 

;th1s    cond    sets    or    clears    the    lsb    of    the    syndrome. 
(  cond 

((parity    (bit    0    mesg)     (bit    2    mesg)     ) 

( setq    synd It)) 

(t 

(setq    syndl    f     ))) 

;Th1s    cond    sets    or  clears    the    msb    of    the    syndrome. 

( cond( ( par Ity    (bit  1    mesg)     (bit    2    mesg)     ) 

( setq    synd2    t  ) ) 

( t 

(setq    synd2    f  )  )     )        )  ) 


Figure    6.2    Ham3.mac 

detector.  The  algorithm  compiled  to  ci-f,  and  Figure  6.3 
shows  a  logic  structure  completely  in  the  control  path.  The 
parallel  lines  at  center  left  are  the  input  (mesg)  bits,  and 
result  from  the  BIT  statement.  They  go  to  the  right  side  of 
the  Weinberger  array,  where  they  fan  out  to  multiple  NOR 
gate    inputs. 
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Figure    6.3    Ham3.ci-f 
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3.    Expanded  Prototype 

The  three  bit  Hamming  error  detector  is  the  trivial 
case.  The  decision  is  in  -favor  of  the  winning  bits  ("two  out 
o-f  three"),  so  the  syndrome  is  not  really  necessary  unless 
the  check  bits  are  wrong  (a  possibility  -for  which  the 
Hamming  code  allows  ). 

The  Hamming  code  is  uniform  in  its  protection, 
however;  once  encoded  there  is  no  difference  between  the 
message  bits  (i)  and  the  check  bits  (c).  This  is  important 
in  checking  longer  words  -for  errors.  A  seven  bit  message  is 
checked  as  in  the  example  given  above.  Elaborating  on  the 
prototype,  Figure  6.4  shows  the  algorithm  to  generate  the 
syndrome  -for  a  seven  bit  parallel  error  detector.  This  error 
detector  requires  a  three  bit  syndrome  to  point  at  one  o-f 
the  possible  seven  error  bits  in  the  message.  Section  A. 
above  illustrates  the  syndrome  generation  process,  and  how 
the  syndrome  word  points  at  the  erroneous  message  bit.  The 
resulting  ci-fplot  is  shown  in  Figure  6.5,  and  the  structure 
is  similar  to  the  Weinberger  array  -for  the  three-bit  error 
detector . 

It  is  good  practice  to  expand  the  algorithm  in 
steps,  instead  o-f  going  directly  from  the  prototype  to  the 
-final  design.  Unexpected  results  can  be  dealt  with  better  i  -f 
this  approach  is  followed. 


;HAM7.MAC 

;A  MacPltts  algorithm  to  Implement  a  7  bit  message  error 

;correct1on  chip.  The  Hamming  method  Is  used.  Four  of  the 

; 7  bits  are  data  bits,  3  of  the  7  are  parity  check  positions. 

(program  ham7  1 

(def  1  ground) 

(def  2  phla) 

(def  3  phlb) 

(def  4  ph 1c ) 

(def  msg  port  Input  (5  6  7  8  9  \Z    11)) 

(def  syndl  signal  output  12) 

(def  synd2  signal  output  13) 

(def  synd3  signal  output  14) 

( def  1 5  power ) 

;The  Hamming  method  uses  parity  checks  over  bit  positions 
;l,3,5,and  7  to  set  the  1 sb  of  the  syndrome, 

jchecks  over  positions  2, 3, 6, and  7  to  set  the  middle  synd  bit, 
;and  checks  over  positions  4,5,6,  and  7  to  set  the  msb  of  the 
;syndrome.  The  value  of  the  syndrome  Indicates  the  bit  error 
iposltlon  In  the  7  bit  message. 

(always 

;set    lsb    of    syndrome: 
(  cond 

((parity    (bit    0    msg)     (bit    2    msg)     (bit    4    msg)     (bit    S    msg)) 

( setq    syndl     t     )  ) 

(t 

(setq    syndl       f )  )  ) 

;set    middle    bit    of  syndromes 

( cond( (par Ity    (bit  1    msg)     (bit    2    msg)     (bit    5    msg)     (bit    6    msg)) 

( setq    synd2    t  )  ) 

(t 

(setq    synd2    f  ) ) ) 

; set    msb    of    syndrome: 

(cond( (par Ity    (bit    3    msg)     (bit    4    msg)     (bit    5    msg)     (bit    6    msg)) 

( setq    synd3    t     ) ) 

(t 

(setq    synd3    f     > ) )  )  > 


Figure    6.4    Ham7.mac 
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Figure    6.5    Ham7.ci.-f 
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4.    Error  Detector 

The  desired  algorithm  is  to  uniformly  detect  errors 
in  a  15  bit  message.  Remembering  the  surprising  inability  of 
MacPitts  to  compile  a  six  input /one  output  gate  in  the  data 
path,  a  test  algorithm  was  written  -for  the  larger  message. 
Figure  6.6  is  the  algorithm  to  detect  errors  in  an  15  bit 
encoded  message.  The  syndrome  bits  are  determined  -from  the 
parity  checks  as  -follows. 


syndrome  message  bit  check  positions 
syndl  0  2  4  6  8  10  12  14 

synd2  1  2  5  6  9  10  13  14 

synd3  3  456  11  12  13  14 

synd4  7  8  9  10  11  12  13  14 


The  single  error  detection  scheme  requires  -four 
bits  to  select  the  message  bit  -for  correction,  thus  the  four 
bit  syndrome.  Syndl  is  the  1 sb  and  synd4  is  the  msb  of  the 
Boolean  syndrome  word.  Figure  6.7  shows  the  cifplot 
resulting  from  hamlS.mac.  The  structure  is  predictably 
similar  to  ham7.cif  and  ham3.cif  (Figure  6.3,  Figure  6.5). 
This  algorithm  serves  as  the  archetype  (chief  model,  as 
opposed  to  prototype,  first  model)  for  the  error  detector. 
The  error  detector  is  half  of  the  solution,  the  other  half 
is  correction  of  the  errors.  The  detection  is  feasible,  as 
proven  by  this  algorithm. 

Table   6.1   shows   a  comparison  between   the   three 
error  detectors. 
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Figure    6.6    Haml5.mac 
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Figure    6.7    Haml5.ci.-f 


197 


TABLE  h.  X 
THREE  ERROR  DETECTORS 


HAM3     HAM7  HAM 15 

Chip  area. 
Cmm**2:  3.473  4.812  11.113 

Control  path  area 

[mm**23  2.75  1.918  8.025 

Number  pullups 

[in  control:       9  31  71 

Number  pads        10  15  24 

MacPitts  pwr. 

CW3  .03194  .06094  .12265 

Powest  pwr.   (avg) 

C  W  D  . 02 1 70  . 03808  . 06 1 9 1 

Powest  pwr.   (max) 

[WD  .04341  .07379  .11746 


296.23 


3.34  1.73 


Ma?-; .  del  ay 

CnsD 

51.54 

Max.  -frequency 

[MHz] 

19.  40 

Cycl es/resul t 

1 

Throughput 

[resul ts/sec ] 

19.40M 

i.34M  1.73M 


So  this  method  o-f  parallel  error  detection  appears 
■feasible  -for  word  lengths  less  than  16  bits.  The  speed  is 
■fast  due  to  the  chosen  single-state  MacPitts  architecture 
(ALWAYS  =  one  PROCESS  with  zero  stack  depth,  or  tor  this 
purpose,  a  single— state  FSM).  These  chips  are  un clacked 
circuits.   The   throughput   is  not  a  -function  o-f   the   clock 


rate,  but  depends  on  the  signal  propagation  time  -from  input 
to  output.  The  propagation  time  sets  the  upper  limit  on 
throughput,  and  the  capacitive  leakage  -From  the  Weinberger 
array  gates  sets  the  lower  limit  on  throughput.  If  the  error 
detectors  are  used  in  a  slow  system,  the  outputs  must 
there-fore  be  latched  to  maintain  valid  logic  levels.  This  is 
easily  done  with  MacPitts,  by  SETQing  the  results  to  -flags, 
and  subsequently  clocking  the  flags  to  output  signal  ports. 

B.   HAMMING  METHOD  15/4  ERROR  CORRECTOR 

The  previous  section  is  only  part  of  the  story.  Having 
located  the  error  bit  in  the  message,  it  must  now  be 
corrected.  The  decision  of  how  to  implement  the  error 
detector  was  a  simple  one,  constrained  by  syntax.  The  error 
detector /corrector  invites  other  methods  of  implementation. 

1 •    Desi  qn  Considerati  ons 

The  message  bit  error  is  pointed  at  by  the  syndrome 
bits  (the  syndrome  indicates  the  erroneous  bit  position). 
The  error  bit  needs  to  be  complemented,  and  the  correct 
message  results.  The  corrected  message  is  then  fed  to  the 
output  ports.  In  this  application,  the  extraneous  check  bits 
are  discarded.  The  check  bits  (c)  are  used  to  encode?  the 
original  message,  and  after  reception  and  decoding  the 
serve  no  purpose. 

The    message    error   detection    and    correction 
procedure  can  be  reduced  to  three  steps;: 


1.  locate  the  error 

2.  complement  the  error  bit 

3.  set  the  corrected  output  word  bits 


The  -first  step  is  done  with  the  error  detection 
part  o-f  the  algorithm.  The  second  step  is  straightforward  in 
MacPitts.  Either  the  output  bit  is  the  input  message  bit 
(the  correct  message  bit  case) ,  or  else  the  output  bit  is 
the  complement  o-f  the  corresponding  message  bit  (the 
incorrect  message  bit  case).  The  checking  is  done  with  the 
COND  -form  in  MacPitts. 

The  third  step  involves  discarding  the  check   bits, 
setting   the  correct  output  bits  to  the  corresponding   input 
bit   values,   and   sending  the  complement  o-f   the   erroneous 
input  bit  to  the  corresponding  output  bit  position. 
2.    Prototype  Desi  qns 

Bit  manipulations  require  Boolean  data  types,  so 
-flags  and  signals  are  used.  The  -flags  store  the  computed 
syndrome  bits,  and  the  signals  are  used  -for  input  and 
output.  Figure  6.8  shows  the  MacPitts  driver,  ham3c.mac. 

There  are  three  COND  statements  in  ham3c.mac.  The 
first  two  determine  the  results  o-f  the  message  parity 
checks,  as  in  the  error  detection  algorithms.  The  last  COND 
sets  the  single  message  bit  according  to  the  result  o-f  the 
parity  checks.  I-f  -fsl  (flag,  syndl)  is  FALSE  and  -fsO  is 
TRUE,  then  the  message  bit  is  incorrect.  The  output  is  then 
set   to  the  complement  o-f  the  input  bit  value.   I-f  the   -form 


■Of) 


under  the  last  COND  is  FALSE,   then  either  there  is  no  error 

in   the   message,   or   the   one  o-f  the   two   check   bits  is 

incorrect.   In   either  case,   the  input  message  data  bit  is 

correct,   so  the  output  data  bit  (outO)  is  set  to  the  1  sb  o-f 
the  input  message  (msgO) . 

The   -format  o-f  the  input  is  three  symbols,   two  o-f 
which  are    check  bits  and  one  data  (information)  bit. 


bit  position      0  1  2 
bit  -function      c  c  i 


Only  the  last  bit  is  returned  -from  the  error 
correction  routine,  the  two  check  bits  (inserted  in  the 
encoding  o-f  the  message)  are  useless  at  this  point.  The  last 
bit  is  the  result  o-f  the  error  correction  process,  and  is 
also  -the  output  ■  o-f  the  prototype  design.  The  algorithm 
(ham-3c.mac)  has  the  syndrome  bits  declared  as  output 
signals.  This  is  considered  good  programming  form  (MacPitts 
being  both  a  language  and  a  silicon  compiler),  and  allows 
troubleshooting  the  algorithm  at  run  time.  The  syndrome 
outputs  are  unnecessary  -for  the  error  corrector  chip,  and 
are  deleted  after  verification  of  the  algorithm  in  the 
command  interpreter. 

The  resulting  cifplot  is  Figure  6.9.  The  BIT 
organelles  are  absent,  but  two  data  path  organelles 
corresponding  to  the  flags  fsl  and  fsO  are  instantiated. 
These  are       the  storage  elements  for  the   computed   syndrome 
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;HAM3C.MAC 

;MacP1tts  algorithm  for  single-error  detection  &  correction. 

;Th1s  algorithm  serves  as  a  paradigm  for  the  Hamming  single 

jerror  detection  and  correction  problem. 

(program  haml  3 

( def  1  ground ) 

(def  2  phla ) 

(def  3  phlb) 

(def  4  phlc) 

;msg(n)  :   the  Input  datum  and  2  parity  check  bits 

;out0    i   the  corrected  datum 

;synd(n)t   the  bit-checked  Hamming  error  syndromes 

;fs(n)   t   Integer  storage  flags  for  the  syndrome  states 

(def  mag2  signal  Input  5) 

(def  msgl  signal  Input  6) 

(def  msg0  signal  Input  7) 

(def  out0  signal  output  8) 

(def  syndl  signal  output  9) 

(def  synd0  signal  output  10) 

(def  fs0  flag) 

(def  f si  flag) 

( def  1 1  power  > 

(always  ;a  1  state  FSM 

( cond.  ;set  the  1 sb  of  the  error-bit  syndrome: 

((parity  msg0   msg2  ) 

(setq  synd0  t  )  < setq  fs0  t)   ) 

(t 

(setq  synd0  f  >  (setq  fs0  f)    )> 
(cond  ;set  the  msb  of  the  error-bit  syndrome: 

((parity  msgl   msg2  ) 

(setq  syndl  t  )  (setq  fsl  t)    ) 

(t 

(setq  syndl  f  )  (setq  fsl  f)   )) 
(cond  ;the  fs(n)  flag  states  determine  whether 

;the  output  datum  requires  correction. 

( (and  (not  fsl )  f s0) 

(setq  out0  (not  msg0>)  ) 

(t 

(setq  out0  msg0))) 

)  ) 


Figure  6.8  Ham3c.mac 
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values.  The  Weinberger  array  writes  to  and  reads  -from  these 
flags,  as  the  algorithm  suggests.  An  implication  of  this 
hybrid  (data  path  and  control  path)  structure  is  slower 
speed.  This  does  not  necessarily  denote  slower  throughput, 
but  slower  signal  speed  across  the  logic  circuitry. 

To  •  the  right  o-f  the  two  -flags  is  a  bank  of  three 
dual  cascaded  vertical  inverters.  This  structure  performs  a 
function  analogous  to  what  the  clock  drivers  do  for  data 
path  registers  (superbuf f ering  and  sequencing  of  the  three 
phases) . 

Just  as  the  error  detector  was  tested  for  the  three 
bit,  seven  bit,  and  15  bit  cases,  so  is  the  error  corrector 
tested  next  for  the  case  of  a  seven  bit  message  (the  error 
corrector  incorporates  the  error  detector  in  its  logic). 

This  section  suggests  a  method  whereby  the  designer 
can  optimize  the  MacPitts  chip.  Three  solutions  to  the  error 
detection/correction  problem  a.re  considered.  Each  is 
investigated,  and  the  best  solution  is  chosen  as  the 
archetype  for  the  final  15  bit  error  corrector  chip.  The 
archetype  is  chosen  on  a  seven  bit  basis  instead  of  the 
simpler  three  bit  chip.  The  seven  bit  error 
detector /corrector s  require  more  time  to  design  and  analyze, 
but  their  performance  is  more  representative  of  the  desired 
chip's  than  is  the  three  bit  detector /corrector . 

The  first  method  is  an  elaboration  on  ham3c.mac. 
The   algorithm  is  shown  in  Figure  6.10,   and  the  cifplot   is 
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Figure  6.11.  This  algorithm  uses  three  -flags  (fsO,  f si  ,  and 
fs2>  to  store  the  individual  syndrome  bits.  The  syndrome 
bits  are  subsequently  tested  in  the  Weinberger  Array,  and 
used  to  selectively  set  the  -four  output  bits  of  the 
corrected  message  (out6,  out5,  out4,  and  out2) .  This 
solution  has  the  advantage  of  clarity,  and  the  disadvantage 
o-f  slowness  due  to  the  hybrid  structure  and  poly  run 
lengths.  In  comparing  this  algorithm  to  Figure  6.8 
(ham3c.mac) ,  it  can  be  inferred  that  the  number  of  COND 
statements  in  the  error  detection  part  o-f  the  algorithm  is 
always  the  same  as  the  number  of  parity  checks  needed. 
Similarly,  the  number  of  CONDs  in  the  error  correction  part 
equals  the  number  of  output  data  bits. 

This  version  of  the  chip  requires  two  clock  cycles 
to  produce  an  output  (write  the  error  syndromes  to  the 
flags,  then  read  the  flags  to  determine  the  correct  output). 
The  throughput  is  318,180  results/sec.  A  result  is  taken  to 
be   a  corrected  data  word,   in  this  case,   a  four-bit   word. 

Figure  6.12  shows  an  alternate  solution, 
ham7cs.mac.  This  algorithm  replaces  the  three  flags  with 
internal  signals,  i sO ,  isl,  and  is2.  Internal  signals  in 
MacPitts  have  the  advantage  of  not  requiring  time-consuming 
storage  operations.  This  architecture  reduces  the  error 
corrector  to  a  combinational  logic  structure,  implemented  in 
the  control  path  due  to  syntax  (all  Boolean  forms).  The 
algorithm   has  a  similar  structure  to  the  previous  one  which 
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used  -flags  to  store  the  syndromes  (Figure  6.10).  There  are 
three  CONDs  to  set  the  syndrome,  and  -four  CONDs  to  set  the 
output  word.  The  question  o-f  internal  timing  arises:  will 
MacPitts  have  the  syndrome  ready  in  time  -for  the  output  word 
setting?  The  answer  is  yes,  because  the  algorithm  executes 
sequentially  in  the  order  written  in  the  absence  of 
parallelising  -forms  (COND,  PAR,  PROCESS). 

This  algorithm  is  faster  than  the  previous  one 
also.  The  throughput  is  2,034,000  words/sec,  almost  six 
times  as  fast  as  the  chip  using  flags  to  store  the  syndrome. 

Another  solution  considers  the  PAR  form  for 
paralleling  the  CONDS.  An  increase  in  speed  results  if  the 
three  CONDs  which  set  the  syndrome  are  paralled,  and  then 
the  four  CONDs  which  set  the  output  are  paralled  with  PAR. 
The  throughput  of  this  chip  is  2,208,000  words/sec,  slightly 
faster  than  the  chip  without  PARs  around  the  CONDs.  This 
translates  into  larger  structure  (Table  6.2).  Figure  6.14  is 
the  MacPitts  driver,  ham7cr.mac,  and  Figure  6.15  is  the 
ci  f pi ot . 

This  version  of  the  error  detector /corrector  is  the 
archetype  (chief  example)  for  the  15  bit  error 
detector /corrector .  It  was  developed  based  on  the  three  bit 
prototype  (Figure  6.8),  refined  ,  tested  with  the  MacPitts 
interpreter  and  Crystal,  and  is  considered  the  optimal 
MacPitts  parallel-architecture  solution  for  the  seven  bit 
correction  problem.   It  serves  as  the  model  for  building  the 
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t  message  error  corrector,  FLAGS  for  syndromes 
cfth   1 

Xdef  2  phtaHdef  3  phlbHdef  4  phlc) 
na 1  Input  5)(def  msgl  signal  Input  S) 
Input  7)(def  msg3  signal  Input  8) 
Input  9)(def  msg5  signal  Input  10) 
Input  1 1  ) 
output  12)(def  out5  signal  output  13) 
output  14)(def  out2  signal  output  15) 
;FLAGS  store  syndromes'  states: 
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s  MESSAGE  bits  are  corrected 
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Check  data  b 
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data  bit  5  (msg  bit  6) 
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It  6  (msg  bit  7)  t 
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Figure  6.10  Ham7c-f.mac 
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Hamm 1 
progr 
def  1 
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s.MAC 

ng  7  bit  message  error  corrector , S IGNALS 

am  ham7cs   1 

groundXdef  2  phlaHdef  3  phtbHdef  4  phfc) 
sg0  signal  Input  5)(def  msgl  signal  Input  6) 
sg2  signal  Input  7)(def  msg3  signal  Input  8) 
sg4  signal  Input  9)(def  msg5  signal  Input  10) 
sg6  signal  Input  11) 

utG  signal  output  12)(def  out5  signal  output 
ut4  signal  output  14)(def  msg2  signal  output 
nals  needed  to  pass  the  syndrome's  bits: 


13) 
15) 


s2  signal  Internal) 

si  signal  Internal) 

s0  signal  Internal) 

7  power ) 

s 

sb  of  syndrome: 


(parity   msg0   msg2   msg4   msgS) 
setq  1s0  t  "   )    > 
t 

setq  1s0  f     )     ) ) 
Iddle  bit  of  syndrome: 
(parity  msgl   msg2  msg5   msg6) 
setq  1 s 1  t    )    ) 
t 

setq  1 3 1  f    )    )) 
sb  of  syndrome: 

(parity   msg3   msg4   msg5   msgS) 
setq  1s2  t    )    ) 
t 

setq  1s2  f    )    )  ) 
data  bit  2  (msg  bit  3)  : 


nd  (not  1s2)  1sl   1  s0 
tq  out2  (not  msg2  )  ) 

tq  out2  msg2)) 
data  bit  4  (msg  bit  5) : 

nd  1s2  (not  isl)   1 s0 
tq  out4  (not  msg4)  ) 

tq  out4  msg4  )  ) 
data  bit  5  (msg  bit  6) i 

nd  1s2  isl  (not  1s0) 
tq  out5  (not  msg5)  ) 

tq  out5  msg5 ) ) 
data  bit  6  (msg  bit  7 ) i 

nd  1s2  isl   1s0 

tq  out6  (not  msg6)  ) 


;Use  SIGNALS  Instead  of  FLAGS: 


tq  out6  msg6 ) ) 


))  ) 


Figure  6.12  Ham7cs.mac 
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HAM7CP.MAC 
Hamming  7  bl 
program  ham7 
def  1  ground 
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def  1 s0  sign 
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output  14)(def  out2  signal  output 
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Figure  6.14  Ham7cr.mac 
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15  bit  machine  (the  seven  bit  model  is  easier  to  analyze  in 
the  interpreter,  and  with  Crystal  and  Esim). 

It  is  impractical  to  do  the  preceeding  design 
process  beginning  with  a  15  bit  machine.  The  15  bit  message 
cannot  be  tested  in  the  interpreter  (all  the  inputs  and 
outputs  will  not  -fit  on  the  VT— 100  screen)  ,  and  Caesar  and 
Crystal  analysis  is  -far  more  complicated  with  large 
structures.  It  is  better  to  optimize  with  a  smaller  model, 
and  then  extend  the  results  to  achieve  the  desired  chip. 

Table  6.2  is  a  parametric  comparison  o-f  the  three 
Hamming  error  detector /corrector  chips.  The  reason  -for  the 
choice  o-f  ham7cr.mac  is  clear  -from  previous  discussion  and 
these  statistics. 

TABLE  6.2 
CHIP  PARAMETRIC  COMPARISON 

HAM7C-F  HAM7Cs  HAM7Cr 

Area  Cmm**2D  7.003  6.305  6.187 

Power  CW:  .102  .0931  .0931 

Delay  Ens]  1581.37  491.64  452.94 

Speed  CMHz]  .6324  2.034  2.208 

Cycles /res.  2  11 

Throughput  Cres/sD  .316M  2.2034M         2.208M 

Speed /area 

HMHz/mm-**2]  .  0903M  .  3226M  .  3579M 

Densi  ty 

Ctran/nim**2]  53.6  45.7  46.6 

2  1  3 


The  reason  for  the  choice  of  ham7cr  as  the  model  is 
seen  in  Table  6.2.  The  chip  (Ham7cr)  is  smaller  and  -faster 
than  its  predecessors.  It  has  the  highest  throughput  of  all 
the  seven  bit  correctors.  The  result  of  using  the  PAR  -Form 
is  seen  by  comparing  the  speed/area  ratios  o-f  ham7cs  and 
ham7cr.  PAR  translates  into  more  decisions  done 
simultaneously,  and  the  decisions  are  done  -faster 
(speed/area  is  greater).  The  result  o-f  storing  the  syndrome 
bits  in  -flags  (ham7c-f)  is  shown  in  its  comparatively  low 
throughput  and  low  speed/area  -figures. 

A  -functional  summary  o-f  the  three  prototype 
candidate  algorithms  (-flowcharts  and  resulting  -floorplans) 
is  given  in  Figures  6.16  -  6.21. 

4.    Hammi  nq  15/4  Error  Corrector 

The  15  bit  error  corrector  is  designed  after  the 
PARalled  COND  version  o-f  the  ham7  algorithm,  ham7cr.mac 
(Figure  6.14).  As  explained  above,  the  number  o-f  CONDs 
expected  is  the  sum  o-f  the  number  o-f  syndrome  bits  and  the 
number  of  corrected  data  bits  out.  There  s.re  four  syndrome 
bits  for  the  15/4  code,  and  11  corrected  data  bits  out,  for 
a  total  of  15  CONDs  in  the  algorithm.  Figure  6.22  shows 
hamlSdc.mac.  The  algorithm  structure  is  similar  to  hamZ, 
except  for  the  pin  naming  which  has  been  shortened  to  make 
it  easier  to  enter  the  data  for  analysis  (Crystal,  Caesar 
labels,  esim).  There  are  four  parity  checks  across  the  bits 
as  described  in  the  paragraph  on  error    detection.  The  parity 
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Figure  6.16  Ham7c-f  Flowchart 
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Figure  6.18  Ham7cs  Flowchart 
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Figure  6.20  Ham7cr  Flowchart 
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checks  result  in  -four  syndrome  internal  signals.  The 
internal  signals  translate  to  -feedback  within  the  Weinberger 
array.  After  the  bit  error  is  identified  by  the  syndrome 
pattern,  it  is  corrected.  There  are  11  CONDs  which 
accomplish  the  bit-wise  correction  of  the  output  word,  one 
•for  each  bit  which  is  not  an  encoding  bit  (positions  0,  1, 
3 ,  and  7) . 

The  algorithm  compiled  to  ci-f,  as  expected.  The 
size  o-f  the  Weinberger  array  (155  columns)  required  a  long 
time  -for  compilation,  approximately  3.5  hours  (at  night)  on 
the  VAX  11/780  at  Naval  Postgraduate  School.  The  resulting 
labelled  ci-fplot  is  shown  in  Figure  6.23.  The  circuit  is  an 
expansion  o-f  the  seven  bit  Hamming  error  correctors,  but 
larger.  The  seven  bit  chip  has  seven  CONDs,  the  15  bit  chip 
has  15.  The  result  o-f  COND  in  the  algorithm  is  NOR  gates  in 
the  Weinberger  array.  The  chip  measures  5.1371  mm  by  4.005 
mm,  for  an  area  of  20.57  sq.  mm.  There  are  238  pullup 
transistors,  so  the  Powest-cal cul ated  power  dissipation  of 
0.1229  W  (average)  is  no  surprise  (MacPitts  estimates  the 
power  consumption  as  0.16086  W).  The  Powest  estimated 
maximum  dc  power  is  0.2321  W.  Crystal  timing  analysis 
predicts  a  maximum  delay  of  1222.94  ns,  for  a  maximum  data 
rate  of  818  kHz  and  therefore  a  maximum  throughput  of 
818,000  results/sec  (8,998,000  bits/sec).  The  circuit 
density  is  sparse,  as  seen  in  the  cifplot,  and  the  average 
density  is  approximately  37  transi stors/sq .  mm.  The  sparsity 
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is  due  in  part  to  the  absence  of  a  data  path.  If  just  the 
Weinberger  array  is  considered,  however,  the  circuit  density 
is  approximately  100  transi stors/sq.  mm.  Appendix  D  contains 
the  script  recording  of  the  compilation  of  haml5dc.mac. 

The  transistor  densities  given  in  Table  6.2  are 
derived  from  MacPitts  chips.  A  comparison  with  standard 
library  cells  densities  derived  from  Newkirk  and  Matthews 
CRef.  12]  may  be  illuminating. 

TABLE  6.3 
TRANSISTOR  DENSITY  COMPARISON 


CIRCUIT 

Ham7Cf 

Ham7Cs 

Ham7Cr 

CountUDRestore 
CRef.  11: p.  793 

COUNT 

CRef.  lisp.  67] 

ALU 

CRef.  lisp.  20  3 

ADDER 

CRef.  11  :p.  103 


DENSITY  Ctran. /mm**2] 
54 
46 
47 
457 

753 


616 


691 


So  the  MacPitts  chips  Are  far  less  dense  than  even 
the  library  macro  cells.  The  Newki r k-Mathews  cells  only 
consider  the  cell  itself,  and  not  the  chip,  which  was  the 
basis   on   which   the  MacPitts   densities   were   calculated. 


Nevertheless,       a       density       -factor    o-f     10       is       a  considerable 

di-f-ference          (the         MacPitts       chips       in       this  chapter          are 

approximately       50V.       circuitry,       and    50'/.    white  space,       so       a 
density    -factor    o-f    -five    is    still     si  gni -f  i  cant )  . 


VII.  CONCLUSION 

A.   SUMMARY 

This  thesis  has  considered  the  effects  of  syntax  on 
circuit  structure  in  the  MacPitts  silicon  compiler.  The 
combinational  logic  structure  is  explicitly  specified  by 
syntax  in  the  data  path,  and  the  appropriate  behavior 
results.  The  circuit  behavior  is  explicitly  specified  in  the 
control  path,  and  the  combinational  logic  structure  (a 
Weinberger  array)     results. 

Combinational  logic  structures  in  the  data  path  comprise 
adjoined  MacPitts  macros  (organelles).  Combinational  logic 
structure  in  the  control  path,  however,  is  always  done  in  a 
Weinberger  array.  The  poly  runs  internal  and  external  to  the 
Weinberger  array  make  combinational  logic  operate  slower 
there  than  in  the  equivalent  circuit  in  the  data  path. 
Parallelism  of  logical  functions  is  possible  in  MacPitts  by 
using  the  COND  and  PAR  forms.  These  paralleling  forms 
usually  equate  to  a  speed /area  tradeoff  on  the  chip. 

Sequential  logic  in  MacPitts  is  implemented  as  a  Mealy- 
type  FSM.  The  state  registers  store  the  present  state,  and 
receive  present  input  information  from  both  the  control  path 
and  the  sequencer  tail  organelle.  The  data  path  width,  as 
declared  in  the  PROGRAM  statement,  determines  the  number  of 
states  possible  for  the  FSM.   This  must  be  determined  by  the 


designer  a  priori,  and  explicitly  stated  in  the  PROGRAM 
statement.  The  long  poly  runs  between  the  data  path  and 
control  path  cause  a  slow  speed  in  the  MacPitts  FSM,  as 
compared  to  the  handcrafted  equivalent.  The  8:1  ratioed 
superbu-f -f ered  input  pads  add  to  this  slowness,  because  of 
the  number  of  NOR  gates  one  pad  may  have  to  drive  in  the 
Weinberger  array. 

The  FSM  architecture  and  its  attendant  Mealy  sequencer 
organelles    are  implicitly   specified   by    the    PROCESS 

statement.  Each  process  is  an  independent  entity  in 
MacPitts,  with  its  own  organelles  and  wires.  Processes  do 
not  communicate  internally  with  each  other.  The  PROCESS  form 
is  another  method  of  parallelism  possible  in  MacPitts.  All 
PROCESSES  embraced  by  PROGRAM  execute  in  parallel,  at  the 
speed  of  the  slowest-executing  process.  This  capability 
makes  MacPitts  well-suited  for  design  of  controller-oriented 
chi  ps. 

The  chip  design  process  with  MacPitts  can  be  understood 
initially  as  algorithmic  optimization.  The  test  algorithm  is 
written,  tested  in  the  interpreter,  and  compiled  to  cif. 
Then  an  expanded  version  of  the  test  algorithm  is  written 
and  tested  in  the  interpreter.  The  expanded  version  is 
compiled  to  cif,  a  circuit  extraction  is  made,  and  the 
electrical  characteristics  and  speed  of  the  chip  are 
determined.  Alternate  solutions  are  then  considered,  and 
tested  in  the  same  fashion.   The  best  of  these  is  chosen   as 


the  archetype  -for  the  desired  chip.  The  archetype  must  have 
suf  f  icientl  y  -few  signals,  ports,  registers,  and  -flags  to 
permit  testing  in  the  interpreter  (a  maximum  of  36).  The 
algorithm  is  then  expanded  again  to  cover  the  desired  chip 
■function.  The  -final  algorithm  is  compiled  to  cif  ,  a  circuit 
extraction  is  made,  and  then  the  chip  is  tested 
electrically.  If  there  are  too  many  variables  to  permit 
command  interpreter  display,  the  algorithm  is  tested  with  a 
switch-level  simulator  (this  exercises  both  the  algorithm 
and  the  circuit).  Further  analyses  with  a  power  estimator 
and  a  timing  analyzer  are  done  to  see  that  the  chip  operates 
within  specifications.  I-f  the  chip  operates  too  slow, 
parallelism  should  be  applied  to  the  algorithm  where 
possible,  in  an  attempt  to  trade  speed  -for  silicon  area. 

B.    RECOMMENDATIONS 

This  thesis  also  investigated  a  number  o-f  MacPitts 
errors  and  shortcomings.  The  -following  recommendations 
should  be  considered: 


1.  Have  the  the  light  controller  chips  -fabricated  by 
MOSIS  -for  testing  at  Naval  Postgraduate  school,  and 
compare  with  the  results  -from  Crystal. 


The  Weinberger  array  errors  as  depicted  in 
Chapter  II  are  thought  to  result  from  incorrect 
installation  of  MacPitts  under  Unix  4.2.  It  would 
be  fruitful  to  search  for  a  Uni x -dependent  roundoff 
error  in  the  instantiation  of  part i al -qate-i nput- 
ground-right  and  part i al -gate-i nput-ground-1 eft . 
The  poly  interconnections  between  data  and  control 
also   suffer  a  lateral  displacement/gap  error,   and 


the  solution  to  the  partial  gate  problem  is  likely 
to  solve  this  one  also.  Similar  errors  were  also 
noted  in  the  data  path,  usually  between  vertical 
metal  lines  and  horizontal  Vdd/GND  busses. 


New  Mead— Conway  organelles  (c-f.  Chapter  III)  should 
be  tried  as  replacements  -for  the  MacPitts  data  path 
organelles.  This  will  require  comparison  between 
similar  structures  with  Powest  and  Crystal,  and 
selection  o-f  the  better  circuit.  MacPitts  will 
connect  the  new  organelles  properly  if  the  pitch  is 


preserved . 


4.  The  error  of  shorted  flag  traces  occurs  almost 
every  time  a  flag  is  declared.  The  vertical  flag 
lines  intersect  the  horizontal  clock  traces  at  a 
via  cut,  which  shorts  the  flag  signal  and  does  not 
permit  it  to  pass  to  control.  The  solution  to  this 
error  is  best  solved  by  a  conditional  test  in  the 
routing  algorithm.  If  the  flag  traces  run  close  to 
the  Vdd/ground  comb,  then  the  traces  must  be  moved 
in  towards  the  center  of  the  chip. 


The  possibility  of  replacing  the  slow  Weinberger 
Array  with  a  PLA  should  be  considered.  This 
solution  will  entail  a  complete  rewrite  of  the 
control. lisp  source  file,  and  major  modification  to 
other  files  which  depend  on  or  interact  with 
control . 1 i sp .  A  study  of  plague  and  plagen  (or 
eqntott  and  tpla)  is  the  best  place  to  start,  with 
a  view  towards  replacing  the  Weinberger  array  with 
a  compact  PLA.  The  difficulty  will  lie  in  the 
interface  between  the  PLA  logic  equation 
specification  (in  plague  or  eqntott)  and  the 
MacPitts  algorithmic  language. 


The  problem  of  vestigal  instantiation  (sequencers, 
unconnected  vertical  poly  runs  from  the  data  path) 
could  be  solved  with  a  simple  test  using  list 
processing  primitives.  If  the  organelles  or  wires 
a.re  not  needed,  then  skip  the  instantiation 
process. 


7.  The  problem  of  the  unconnected  Vdd  bus  only  occurs 
in  very  small  chips,  but  should  be  simple  to 
correct.   A   metal  routing  up  and  to  the   left,   to 


connect  to  the  Vdd  comb  is  required.  The  simple 
solution  is  to  explicitly  specify  a  connecting  wire 
in  the  CLL-1 i ke  language  used  in  the  MacPitts 
source  code.  The  more  instructive  solution  is  to 
write  the  Franz  LISP  code  to  decide  if  a  jumper 
wire  is  needed,  and  i  f  so,  to  create  one. 


8.  A  menu  invoking  Crystal,  Esim,  Powest ,  and  Mextra 
would  speed  up  the  design  cycle.  The  menu  could  be 
incorporated  in  MacPitts,  but  would  probably  be 
just  as  good  external  to  MacPitts.  A  timing 
analysis  is  necessary  in  the  compilation  o-f  the 
chip,  however.  If  it  had  existed  during  the  Hamming 
15/4  error  corrector  example  (Chapter  VI),  the 
choice  o-f  an  archetype  chip  would  have  been 
si  mpl er . 


The  vT— 100  terminal  screen  is  too  small  to  display 
the  interpreter  session  of  all  the  signals,  flags, 
registers,  and  ports  which  occur  on  even  a 
moderate-sized  MacPitts  chip.  A  windowing 
capability  is  needed.  The  source  file 
i nterpret . 1 i sp  contains  the  command  interpreter 
logic.  The  interpreter  is  functionally  a  dynamic 
debugger,  similar  to  those  in  CP/M  or  VMS  (but 
without  the  ability  to  change  the  source  code).  The 
interpreter  has  a  very  slow  response  time  to 
terminal  inputs  for  all  but  the  simplest  chip 
algorithms,  and  it  would  be  useful  to  speed  it  up 
also  if  other  modifications  are  planned. 


10.  SPICE  would  be  a  valuable  addition  to  timing 
analysis.  Currently,  SPICE  2g6  is  not  installed  on 
the  VAX-1 1/780  at  Naval  Postgraduate  School.  A  plot 
of  the  SPICE  output  is  also  desired,  but  not 
available  under  the  currently  installed  version  of 
Unix  4.2. 


11.  The  capability  to  scale  the  MacPitts  designs  to 
sizes  other  than  multiples  of  200  or  250 
centimicrons  is  needed  for  future  applications.  The 
ability  to  scale  in  multiples  of  25  centimicrons  is 
suggested,  where  the  designer  chooses  the  option  at 
compile  time  in  the  MacPitts  '(options)  field. 

12.  MacPitts  currently  places  pads  on  only  three  sides 
of   the   chip  frame.   A  better  design  would   permit 


I 


pads  to  be  placed  on  all  -four  sides  of  the  chip. 
This  would  also  allow  -faster  chips,  due  to 
shortened  inter-chip  wires. 


13.  The  capability  o-f  automatic  test  vector  generation 
and  evaluation  is  lacking.  The  command  interpreter 
should  be  able  to  access  an  existing  file  -for 
testing  and  write  the  results  o-f  the  tests  to 
another  -file. 


14.  The  ability  to  display  transistor  density  as  one  of 
the  compiler  statistics  should  be  incorporated. 
This  would  be  a  simple  task,  since  MacPitts  already 
computes  the  chip  dimensions  and  the  number  of 
transistors,  and  writes  each  of  these  values  to  the 
statistics  output  file. 


15.  A  serial  implementation  of  the  Hamming  15/4  error 
detector/  corrector  should  be  attempted  using 
primitive  polynomials  CRef.  133,  CRef.  5:pp.  2003. 
The  throughput  should  be  compared  to  the  parallel 
15/4  error  corrector.  The  interesting  problem  is  to 
solve  the  differing  bandwidths  at  the  input  and 
output  of  the  shift  register.  MacPitts  may  not  be 
able  to  cope  with  this  requirement,  and  will  likely 
be  slower  than  the  parallel  architecture  (in  the 
throughput  sense)  regardless. 

16.  A  MacPitts  prototype  FIR  or  IIR  digital  filter 
should  be  attempted.  The  first  model  should  be  an 
FIR  four-bit  prototype,  and  this  algorithm  can  then 
be  expanded  to  the  floating  point  version  of  larger 
word  length.  An  excellent  reference  for  the 
designer  is  CRef.  14:pp.  5413,  where  the 
algorithmic  aspects  of  digital  filter  design  a.r& 
ex pi ai  ned . 


17.  Faster  graphics  B.rs  required  for  the  VLSI  graphics 
terminal  (Caesar).  A  better  (i.e.,  quicker) 
terminal  should  be  considered. 


13.  The  Backus-Naur  file  (BNF)  included  with  the 
MacPitts  source  code  specifies  allowed  algorithmic 
syntax.  The  macro  and  lambda  forms  should  be 
investigated  with  a  view  to  incorporating  macros 
into  the  algorithms. 


19.  It  would  speed  up  the  design  time  and  confer  added 
versatility  on  MacPitts  i  -f  the  input  port  width 
could  be  specified  as  a  variable.  The  word  lengths 
would  then  be  assigned  according  to  another  single 
statement  in  the  MacPitts  algorithm.  For  instance 

(de-f  -face  port  input  (*)  ) 
(de-f  data  word  width  16) 


would  assign  a  16-bit  width  to  the  variable  <-face>, 
and  to  any  other  occurrences  o-f  the  asterisk. 
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-4  (<( port- 1  nput  a) 
( ( (  Internal  4  )  )  )  )  ) 


(port-fnput  e)M) 
(  Interna  1  1  )  )  )  ) 
{  Internal  2  )  )  )  ) 
(  1 nterna 1  3  >  )  )  > 


phlc  )  ) 
hlb  >  ) 
h  la  )  ) 
round )  ) 
power ) ) 
nput  (a 
nput  ( b 
nput  ( c 
nput  ( d 
nput 


(pi 

(9> 

(; 

(Input  (a  0)  (port-Input  a  0)) 

(Input  (b  0)  (port-Input  b  0>) 

(Input  (c  0)  (port-Input  c  0>) 

(Input  (d  0)  (port-Input  d  0)) 

(Input  (e  0)  (port-Input  e  0)) 
(outputs  (z  0)  (port-output  z  0))))) 


Data  Path  Five  Input  AND  Gate  .obj  File 
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Stat 

Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 


stt 
stl 

d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
stl 
stt 
d  - 
d  - 
stl 
d  - 
stl 
stl 
stt 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stt 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
d  - 
st  i 
st  t 
stl 
stl 


c  -  for  project  f tvand 

c  -  options:  (herald  opt-d  opt-c  stat  ob J  ctf  nologo) 


68, 
72. 
901  , 
903, 
986, 
989, 
990, 
991  , 
991  , 
991  . 


58  - 
58  - 
61  1  - 
61  1  - 
611  - 
611  - 
61  1  - 
611  - 
61  1  - 
611  - 

-  Max  1  mum 

-  Number 


Reading  source  file  -  flvand.mac 

Reading  library  from  -  / v 1 s  I /macp 1 1/ 1  Ibr ar y 

-  Processing  definitions 

-  Evaluating  eva 1 s 

-  Expanding  macros 

-  Extracting  sources 
dest I nat  tons 
1 abe 1 s 
sequencer  s 
flags,  data-path,  control,  and  pins 

Is  0 


Extracting 
Extract  Ing 
Extracting 
Extracting 
control 
of  gates 


c  -  Maximum  control  depth 

c  -  Number  of  gates  is  0 

c  -  Data-path  has  5  Units 
1383,  901  -  Outputlng  .obj  file 
1413,  501  -  Extruding  gates 

c  -  Control  has  0  columns 
1516,  997  -  Extruding  straps 

c  -  Circuit  has  98  transistors 

c  -  Control  has  0  tracks 

c  -  Power  consumption  Is  0.038114 


Watts 


1679,  1095  -  Laying  out  data-path 

1815,  1192  -  Organelle  unit*  1  bit  0 

2014,  1290  -  Organelle  unit*  2  bit  0 

2168,  1391  -  Organelle  unit*  3  bit  0 

2332,  1498  -  Organelle  unit*  4  bit  0 

2385,  1498  -  Organelle  un1t#  5  bit  0 

:  -  Data-path  Internal  bus  uses  6  tracks 


out 
out 
out 
out 
out 
out 


2539,  1600  -  Laying 

2542,  1600  -  Laying 

2543,  1600  -  Laying 
2545,  1600  -  Laying 
2547,  1600  -  Laying 
2683,  1699  -  Laying 

c  -  Dimensions  are  1 

5299,  3105  -  Outputlng 
c  -  Memory  used  -  357K 
c  -  Compilation  took  1.534722 
c  -  Garbage  collection  took  0 
c  -  For  a  total  of  33  garbage 


control 
f  1  ags 
r  1  ver 
w  1  ng 

ske 1 eton 
pins 
805000  mm  by 
elf  file 


1 .872500    mm 


CPU    minutes 
893333    CPU    minutes 
col  lections 


Script    of    Compilation    o-F    Data    Path    Five    Input    AND    Gate 


94  41  64200  79400; 
94  42  82200  79400; 
94  43  100200  79400; 
94  a  46300  79600; 
94  41  64200  79600; 
94  42  82300  79600; 
94  43  100300  79600; 
94  54  48000  79900; 
94  41  54200  79900: 
94  55  66000  79900: 
94  42  72200  79900 
94  56  84000  79900; 
94  43  90200  79900; 
94  57  102000  79900; 
94  z  108200  79900; 
94  54  49800  80400: 
94  41  55500  80400 
94  55  67800  80400 
94  42  73500  80400 
94  56  85800  80400: 
94  43  91500  80400; 
94  57  103800  80400; 
94  z  109500  80400; 
94  a  46300  80400; 
94  41  64300  80400; 
94  42  82300  80400: 
94  43  100300  80400; 
94  Vdd  52000  80600; 
94  Vdd  57700  80600; 
94  Vdd  70000  80600; 
94  Vdd  75700  80C00; 
94  Vdd  88000  80600: 
94  Vdd  93700  80600; 
94  Vdd  106000  80600; 
94  Vdd  11 1700  80600; 
94  54  49800  81600; 
94  41  55500  81600 
94  55  67800  81600: 
94  42  73500  81600; 
94  56  85800  81600 
94  43  91500  81600; 
94  57  103800  81600; 
94  z  109500  81600; 
94  54  49800  82400: 
94  41  55500  82400: 
94  55  67800  82400 
94  42  73500  82400 
94  56  85800  82400: 
94  43  91500  82400; 
94  57  103800  82400; 
94  z  109500  82400: 
94  z  1 16500  83600; 
94  e  97200  84900; 
94  z  109400  84900; 
94  z  113500  84900: 
94  d  79200  86100; 
94  43  91400  86100: 
94  43  95500  86100: 


94  GND  41500  71700;       94 

C  S1200  87400; 

94  Vdd  52000  76800;       94 

42  73400  87400; 

94  Vdd  57700  76800;       94 

42  77500  87400; 

94  Vdd  70000  76800;       94 

b  43200  88600; 

94  Vdd  75700  76800;       94 

41  55400  88600: 

94  Vdd  88000  76800;       94 

41  59500  88600; 

94  Vdd  93700  76800;       94 

a  41500  89900; 

94  Vdd  106000  76800; 

94  Vdd  111700  76800; 

94  b  43200  76900 

94  c  61200  76900 

94  d  79200  76900 

94  e  97200  76900 

94  z  113500  76900; 

94  b  43200  76900 

94  GND  48000  76900; 

94  c  61200  76900 

94  GND  66000  76900; 

94  d  79200  76900 

94  GND  84000  76900; 

94  e  97200  76900 

94  GND  102000  76900; 

94  z  I  13500  76900; 

94  b  46300  77100 

94  c  64300  77100 

94  d  82300  77100 

94  e  100300  7710f 

1; 

94  b  45000  77100 

94  c  63000  77100 

94  d  81000  77100 

94  e  99000  77100 

94  b  46300  77800 

94  c  64300  77800 

94  d  82300  77800 

94  e  100300  7780( 

1: 

94  GND  53700  7 8 1 i 

12; 

94  GND  71700  7811 

12; 

94  GND  89700  78U 

12; 

94  GND  107700  78 

100; 

94  a  41500  78600 

94  41  59500  78601 

r; 

94  42  77500  78601 

1; 

94  43  95500  78601 

1; 

94  a  41500  78600 

94  45  48000  78601 

J; 

94  41  59500  78601 

J: 

94  47  66000  78601 

1; 

94  42  77500  78601 

1; 

94  49  84000  78601 

J; 

94  43  95500  78601 

J; 

94  51  102000  786J 

70; 

94  z  1 16500  7890 

7; 

94  z  116500  7890J 

7; 

94  54  53200  7930 

J; 

94  55  71200  7930J 

3; 

94  56  89200  79301 

3: 

94  57  107200  7931 

30; 

94  a  46200  79400 

; 
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Crystal 
»  build 
[0:00. 1 
»  Input 
[ 0 : 00 . 0 
:  outpu 
10:00 .0 
:  del  ay 
Mark  1 ng 
Setting 
Setting 
(9  stag 
[ 0 : 00 . 1 
t  del  ay 
( 1  stag 
10:00.0 
t  de 1  ay 
( 1  stag 
10:00 .0 
t  delay 
{ 1  stag 
10:00.0 
:  delay 
{ 1  stag 
[0:00.0 
t  cr It  1 
Node  z 

57* 

43* 

56* 

42* 

55* 

41 

54* 


,  v.2 

Sander . s 
u  0 1 00 . 2  s 
s  a  b  c  d 
u  0:00.0s 
ts  z 

u  0:00.0s 
a  -1  0 
trans  1st 
Vdd  to  1 
GND  to  0 
es  examln 
u  0 : 00 . 1 s 

b  -1  0 
es  exam  1 n 
u  0:00.0s 

c  -1  0 
es  exam  1 n 
u  0:00.0s 

d  -1  0 
es  exam  1 n 
u  0:00.0s 

e  -1  0 
es  exam  1 n 
u  0:00.0s 
cal 

Is  dr 1 ven 
. . through 
Is  dr 1 ven 
. . through 
Is  dr 1 ven 
. . through 
Is  dr I ven 
. . through 
Is  dr 1 ven 
. . through 
Is  driven 
. . through 
Is  dr 1 ven 
. . through 
Is  driven 
. . through 
s  dr 1 ven 
U  0 : 00 . 1 S 
ca 1  -g  5a 
u  0 : 00 . 1 s 


1m 
21k] 

e 
30k] 

30k] 

or  flow. 


ed.  ) 
31k] 

ed.  ) 
31k] 

ed.  ) 
31k] 

ed.  > 
31k] 

ed.  ) 
31k] 


1  ow 

fet 

h  Igh 

fet 

1  ow 

fet 

high 

fet 

1  ow 

fet 

h  Igh 

fet 

1  ow 

fet 

high 

fet 

1  ow  a 
31k] 

ndcr  . 
36k] 


at  8 

at  < 

at 
at  ( 
at  6 
at  ( 

at 
at  ( 
at  4 
at  ( 

at 
at  ( 
at  2 
at  ( 

at 

at  < 

t  0. 


6.01ns 
541,  397) 
70.55ns 
519,  405) 
1 .39ns 
451,  397) 
50. 40ns 
429,  405) 
1  .22ns 
361,  397) 
29.99ns 
339,  405) 
0.81ns 
271,  397) 
9.40ns 
249,  405) 
00ns 


to  GND  after 
to  Vdd  after 
to  GND  after 
to  Vdd  after 
to  GND  after 
to  Vdd  after 
to  GND  after 
to  Vdd  after 


a  1 
[0:00. 1 1 

:     critical     -g    5andcr.dum 
[ 0 : 00 . 1 1 
:     quit 

[0:00. 4u    0:00.4s    36k]    Crystal    done, 
%    "D 
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vis  ♦! 
push  541 
paint  e 
label  [8 
push  519 
paint  e 
label  C7 
push  451 
pa  f  nt  e 
label  C6 
push  429 
paint  e 
label  C5 
push  361 
paint  e 
label  [4 
push  339 
paint  e 
label  C3 
push  271 
paint  e 
label  C2 
push  249 
paint  e 
label  [1 


397  2  2 

186.0ns.fal 1 
405  2  2 

]  70. 6ns, rise 
397  2  2 

]61 .4ns. fal 1 
405  2  2 

150. 4ns .rise 
397  2  2 

]41  .2ns, fal  1 
405  2  2 

130.0ns , r Ise 
397  2  2 

]20.8ns,fal 1 
405  2  2 

19  .  4ns .rise 
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X  powest  -p  < a5andcr . s  1  m 

gamma=0. 4V"* . 5,  tox=9e-08m,  u0»0 . 08m**2/V-s 

vdd=5V,  vtd=-3.5V,  vte=0.8V,  vsb=2V 

#devs    Pdc_avg  <W)      Pdc_max  (W)      type 


0.000000 
0.000940 

0.000000 

0.0009  40 


0.000000 
0.001879 
0.00C000 

0.001879 


enhancement  pullups 

depletion  pullups 

special  depletion  pullups 

TOTAL 


Data  Path  Five  Input  AND  Powest  Analysis 
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(( 


( ( 
( 


est  In 
our  ce 
ource 
our  ce 
ource 
ource 
ogo  f 
ord-1 
round 
1  gna  1 
1  gna  1 
1  gna  1 
1  gna  1 
1  gna  1 
1  gna  1 
h1a  2 
nib  3 
hie  4 
ower 


slgna 

gate 

nor 

{  (pr 

<pr 

(pr 

(pr 

(pr 

gate 

nor 

(  (pr 

(pr 

(pr 

(pr 

(pr 

(pr 

(pr 

(pr 

(pr 

gate 

nor 

<  (pr 
(Pr 
(pr 
(pr 
(pr 
(pr 
(pr 
(pr 

gate 
nor 

<  (pr 
(pr 
(pr 
(pr 
(pr 


at  Ion  z  ) 
a) 

b  ) 
c) 
d  ) 
e) 

I  veand ) 
ength  1  ) 

1  ) 

a  1  nput 
1  nput 
1  nput 
1  nput 
1  nput 
output 

) 

) 

) 

II  )  ) 


5) 
6) 

7  ) 

8  ) 
9) 

10) 


1 -output  z ) 

10) 


(nor  ((primitive  (gate  10))))) 


mlt 
mlt 
mlt 
mlt 
mlt 
9) 

mlt 
mlt 
mlt 
mlt 
mlt 
mlt 
mlt 
mlt 
ml  t 
8  ) 

mlt 
mlt 
m  1 1 
ml  t 
mlt 
mlt 
mlt 
ml  t 
7) 

mlt 
mlt 
ml  t 
mlt 
m  It 


ve 
ve 
ve 
ve 
ve 


ve 
ve 
ve 
ve 
ve 
ve 
ve 
ve 
ve 


ve 
ve 
ve 
ve 
ve 
ve 
ve 
ve 


ve 
ve 
ve 
ve 

ve 


gate 
gate 
gate 
gate 
gate 


9)  ) 
8)  ) 
7)  ) 
6)  ) 
5)  )  ))  ) 


gate  4 ) > 
gate  3 ) ) 
gate  2  )  ) 
gate  1 ) > 
gate  0) ) 
s  1  gna  1  -  I  nput 
s  1  gna 1  -  1 nput 
s  I  gna  1  -  1  nput 
s  1  gna 1  -  1  nput 


gate  4 ) ) 
gate  3  )  ) 
gate  2  >  ) 
gate  1  )  ) 
gate  0  >  ) 
s  1  gna 1  -  1  nput 
signal-Input 
signal-Input 


gate 
gate 
gate 
gate 
gate 


4)  ) 
3)  ) 
2)  ) 

1  )  ) 
0)  ) 


a)) 
b)  ) 
c  )  ) 
d)  )  )  )  ) 


a)  ) 
b  >  ) 
c  >  )  )  )  ) 
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{ 

( 

(  (ga 

( no 

(  ( 

( 

( 

( 

( 

( 

(  (ga 

(  no 

(  ( 

{ 

( 

( 

( 

((ga 

(  (ga 

(  (ga 

(  (ga 

(  (ga 


pr 1m  It 
pr 1 m 1 t 
te  6) 
r 

pr 1 m 1 t 
pr  t m  i  t 
pr 1 m  1 1 
pr 1m 1 t 
pr Imlt 
pr  Imlt 
te  5) 
r 

pr  Imlt 
pr Imlt 
pr 1 m  f  t 
pr Imlt 
pr  1  m 1 1 
te  4  ) 
te  3> 


ive  ( s  Ignal- 1 nput  a)) 

1  ve  ( s  Ignal -  Input  b  >  )  )  )  ) 


1  ve 

J  ve 
1  ve 
1  ve 
1  ve 
1  ve 


(  gate 

(  gate 
(  gate 
(  gate 
(gate 


4)  ) 
3)  ) 
2)  ) 
1  )  ) 
0)  ) 


( s  Igna 1  -  1 nput  a  )  )  )  )  ) 


te 
te 
te 


2> 
1  ) 
0) 


1  ve 
1  ve 
1  ve 
1  ve 
1  ve 
(  nor 
(  nor 
(  nor 
(  nor 
(  nor 


(  (4 
(3 
(2 
(  1 
(  1  1 
(9 
(8 
(7 
(6 
(5 
(  10 


ph  Ic  )  ) 
ph  lb )  ) 
ph  la ) ) 
ground 
( power 
I  nput 
1  nput 
Input 
Input 
1  nput 
( outpu 


4  )  ) 

3)  > 

2)  ) 

1  )  ) 

0)  )  )  )  ) 
(primitive 
( pr 1m  ft  Ive 
( pr 1m  1 1 1  ve 
( pr  I  m 1 1 1 ve 
( pr 1 m 1 t 1 ve 


(  gate 
( gate 
(gate 
(gate 
(  gate 


( s  Igna 1  -  1 nput  a ) ) 
(signal-Input  b  )  ) 
(signal-Input  cM 
(signal-Input  d)) 
(signal-Input  e>) 


>  > 

)  > 

e 
d 
c 
b 
a 
t8 


( s 1 gna 1  -  1 nput  e  )  ) 
(signal-Input  d>) 
( s 1 gna 1  -  1 nput  c  )  ) 
( s 1 gna 1  -  1  nput  b  )  ) 
(signal-Input  a)) 
z  ( s 1 gna 1 -output  2))))) 
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Script  started  on  Mon  Apr 
%    macpltts  f I veard . her  a  1 d 


15  22:29:07  1985 


Stat 
Stat 
Hera 
Hera 
Hera 
Her  a 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 
%  ~D 
scr  I  pt 


stlc  -  for  project  flveand 
stlc  -  options:  (herald  opt-d 


63, 
70, 

896, 
898, 
983, 
1  103, 
I  108  , 
1110, 
1110, 
1111, 


opt-c  stat  obj  elf  nologo) 
55  -  Reading  source  file  -  five and. mac 
5  5  -  Reading  library  from  -  /vlsl/nacplt/ltbrary 
6  04  -  Processing  definitions 
604  -  Evaluating  evals 
604  -  Expanding  micros 

Extracting  sources 

dest  I  na t  Ions 

label s 

sequencer  s 

flags,  data-path,  control,  and 


d 

d 

d 

d 

d 

d 

d 

d 

i 

d 

stlc  -  Maximum  control  depth  Is  4 

stlc  -  Number  of  gates  Is  12 

stlc  -  Data-path  has  0  Units 


701 
701 
701 
701 
701 


-  Extracting 
Extract i  ng 
Extracting 
Extract  Ing 


pins 


d  -  1946  .  1286 

d  -  2002.  1286 

stlc  -  Control 

d  -  4031 ,  24  17 

stlc  -  Circuit 

stlc  -  Contro 1 

stlc  -  Power  consurtpt  Ion  is 

d  -  4183,  2517  -  Laying  out 

stlc  -  Data-path 


-  Outputing  .obj  file 

-  Extruding  gates 
has  17  col  umiis 

-  Extruding  straps 
has  136  transistors 
has  11  tracks 

0.040723  Watts 
dat  a -path 


d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
st  1  c 


418b 
4997 
4999 
5  00  0 
5018 
5054 


251  7 
2943 
2943 
2943 
2943 
2943 


-  D  I  men  s 1 ons 


internal  bus  uses  0  tracks 
Laying  out  control 
Laying  out  flags 
Laying  out  river 
Laying  out  wing 
Laying  out  skeleton 
Laying  out  pins 
are  1  .  772500  mm  by 


1  .905000  mm 


d  -  7361.  4042  -  Outputing  .elf  file 
stlc  -  Memory  used  -  349K 
stlc  -  Compilation  took  2.106111 
stlc  -  Cartage  collection  took  1, 
stlc  -  For  a  total  of  41  garbage 


CPU  minutes 
15  3  889  CPU  m1 
col lect ions 


nutes 


done    on    Mon    Apr     15    22:34:42     1985 
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94  Vdd  41000  4G000 

94  Vdd  45200  47700 

94  Vdd  48200  47700 

94  Vdd  50400  47700 

94  Vdd  53400  47700 

94  Vdd  55700  47700 

94  Vdd  58700  47700 

94  Vdd  62700  47700 

94  Vdd  S5700  47700 

94  Vdd  67900  47700 

94  Vdd  73200  47700 

94  Vdd  78400  47700 

94  Vdd  81400  47700 

94  14  45000  48900 

94  15  48000  48900 

94  16  50200  48900 

94  2  53200  48900; 

94  18  55500  48900 

94  19  58500  48900 

94  20  62500  48900 

94  21  65500  48900 

94  22  67700  48900 

94  23  73000  48900 

94  24  78200  48900 

94  25  81200  48900 

94  14  45500  51200 

94  15  48500  51200 

94  16  50700  51200 

94  2  53700  51200; 

94  18  56000  51200 

94  19  59000  51200 

94  20  63000  51200 

94'  21  66000  51200 

94  22  68200  51200 

94  23  73500  51200 

94  24  78700  51200 

94  25  81700  51200 

94  14  45000  52900 

94  15  48500  52900 

94  16  50200  52900 

94  2  53700  52900; 

94  18  55500  52900 

94  19  59000  52900 

94  20  62500  52900 

94  21  66000  52900 

94  22  67700  52900 

94  23  73000  52900 

94  24  78200  52900 

94  25  81700  52900 

94  d  71200  53900; 

94  22  67700  53900 

94  d  71200  53900; 

94  GND  48500  54700 

94  GND  78700  54700 

94  GND  81700  54700 

94  GND  46700  54900 

94  GND  52000  54900 

94  GND  57200  54900 


94  GND  64200  54900 
94  GND  69500  54900 
94  GND  80000  54900 
94  GND  46700  549.00 
94  GND  52000  54900 
94  GND  57200  54900 
94  GND  64200  54900 
94  GND  69500  54900 
94  GND  80000  54900 
94  15  48500  55900; 
94  25  81700  55900; 
94  2  53700  56700: 
94  18  56000  56700 
94  19  59000  56700 
94  20  63000  56700 
94  22  68200  56700 
94  24  78700  56700 
94  16  50200  57900 
94  GND  56000  58700 
94  GND  59000  58700 
94  GND  63000  58700 
94  GND  68200  58700 
94  GND  78700  58700 
94  GND  52000  58900 
94  GND  57200  58900 
94  GND  64200  58900 
94  GND  69500  58900 
94  GND  80000  58900 
94  a  41500  59900; 
94  a  41500  59900; 
94  23  73000  59900 
94  16  50700  60700 
94  18  56000  60700 
94  19  59000  60700 
94  20  63000  60700 
94  22  68200  60700 
94  24  78700  60700 
94  14  45000  61900 
94  GND  56000  62700 
94  GND  59000  62700 
94  GND  63000  62700 
94  GND  68200  62700 
94  GND  78700  62700 
94  GND  46700  62900 
94  GND  57200  62900 
94  GND  64200  62900 
94  GND  69500  62900 
94  GND  80000  62900 
94  b  43200  63900; 
94  b  43200  63900; 
94  14  45500  64700 
94  18  56000  64700 
94  20  63000  64700 
94  22  68200  64700 
94  24  78700  64700 
94  19  59000  65000 
94  21  66000  65900 
94  GND  56000  66700 


94  GND  59000  66700 
94  GND  63000  66700 
94  GND  68200  66700 
94  GND  78700  66700 
94  GND  74700  66900 
94  GND  46700  66900 
94  GND  57200  66900 
94  GND  64200  66900 
94  GND  69500  66900 
94  GND  74700  66900 
94  GND  80000  66900 
94  e  76500  67900; 
94  19  59000  67900; 
94  e  76500  67900; 
94  15  48500  68700 
94  20  63000  68700 
94  22  68200  68700 
94  23  73500  68700 
94  24  78700  687C0 
94  21  66000  69000 
94  c  60700  69900; 
94  18  55500  69300; 
94  c  60700  69900; 
94  GND  48500  70700 
94  GND  63000  70700 
94  GND  66000  70700 
94  GND  78700  70700 
94  GND  46700  70900 
94  GND  64200  70300 
94  GND  80000  70900 
94  24  78200  7  1900 
94  15  48500  72700 
94  20  62500  73900 
94  GND  48500  74700 
94  GND  46700  74900 
94  a  41500  75900 
94  b  43200  75900 
94  2  53700  75900 
94  c  60700  75900 
94  d  71200  75900 
94  e  76500  75900 
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APPENDIX  B 


CHAPTER  IV  LISTINGS 


Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 


Stl 

st! 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stf 
stl 
att 
d  - 
d  - 
stl 
d  - 
stl 
stl 
st! 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
st! 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
st! 
d  - 
st! 
st! 
st  1 
st! 


ctf  nologo) 


c  -  for  project  gc 

c  -  options:  (herald  opt-d  opt-c  stat  obj 

64,  57  -  Reading  source  file  -  gc.mac 

69,  57  -  Reading  library  from  -  /v 1 s 1 /macp It/ 1 Ibrar y 

-  Processing  definitions 

-  Evaluating  evals 

-  Expanding  macros 
-  Extracting  sources 

dest Inat Ions 
labels 
sequencers 

flags,  data-path,  control,  and  pins 
is  4 


622 
622 
622 
62 
62 
71 
71 
71 


911, 

912, 

996, 

1009, 

1012, 

1108, 

1108, 

1110, 
c  -  Max Im 
c  -  Numbe 
c  -  Data- 

2625,  17 

2716,  17 
c  -  Contr 

8491,  47 
c  -  Clrcu 
c  -  Contr 
c  -  Power 

8910,  49 

9070. 

9263, 

9318, 

9549. 

9636, 

9784, 

9846, 

10274, 

10470, 

10509, 

10578, 

10801 , 

10997, 

11014, 
c  -  Data- 

11096,  5 

13020, 

13023. 

13168, 

13177, 

13262. 
c  -  D Imen 

15882,  8 
c  -  Memor 
c  -  Comp I 
c  -  Garba 
c  -  For  a 


2 

2  - 

6  - 

6  - 

6  - 

urn  control 

r  of  gates 


Extract  1 ng 
Extract  1  ng 
Extract  1 ng 
Extract  1  ng 

depth 
Is  26 


path  has  7  Units    * 

22  -  Outputlng  .obj  file 

22  -  Extruding  gates 

ol  has  31  col umns 

85  -  Extruding  straps 

It  has  280  transistors 

ol  has  12  tracks 


consumption  Is 


93 
99 
07 
07 
13 
13 
21 

21  - 
652  - 
765  - 
765  - 
765  - 
876  - 
989  - 
989  - 
path 
989  - 
925  - 
925  - 
041  - 
041  - 
041  - 
s  Ions 
254  - 
y  used 


Laying 

Organel 

Or gane 1 

Organel 

Organel 

Organel 

Organe 1 

Organe 1 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

1 nter  na 

Laying 

Laying 

Laying 

Laying 

Laying 

Laying 


out 

e 

e 

e 

e 

e 

e 

e 

le 

le 

le 

le 

le 

le 

le 


0.055910  Watts 
data-path 


un  !t# 
unit* 
un  lt# 
unit* 
unit* 
unit* 
unit* 
un  It* 
unit* 
unit* 
unit* 
unit* 
unit* 
unit* 
bus  uses  3 
out  control 
flags 
r  1  ver 
wl  ng 

ske 1 eton 
pins 


bit 

bit 

bit 

bit 

bit 

bit 

bit 
b  It 
bit 
bit 
bit 
bit 
bit 
bit 


tracks 


out 

out 

out 

out 

out 

are  2.587500  mm  by  1 
Outputlng  .elf  file 
-  403K 


982500  mm 


latlon  took  4.487778 

ge  collection  took  2, 

total  of  79  garbage 


CPU  minutes 
328889  CPU  minutes 
col  1 ect  Ions 


GC. script 
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Stat 

Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 


stf 
stt 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
stl 
stl 
d  - 
d  - 
stf 
d  - 
stl 
stl 
stl 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stt 
d  - 
st  1 
stl 
st  I 
stl 


c  -  for  project  gc2 
c  -  optfonst  (herald  opt 
61,  54  -  Reading  source 
64,  54  -  Reading  1  Ibrar 
882,  596  -  Processing  d 
884,  596  -  Evaluating  e 


967,  596  -  Expanding  ma 
986,  596  -  Extracting  s 
1084,  692  -  Extracting 

1086,  692  -  Extracting 

1087,  692  -  Extracting 
1090,  692  -  Extracting 

-  Maximum  control  dept 

-  Number  of  gates  Is  2 


c 

c 

c  -  Data-path 
2661,  1695  - 
2766,  1695  - 

c  -  Control 
9213,  5045 

c  -  Circuit 

c  -  Control 


has  8  Unit 

-  Outputlng 

-  Extruding 
has  32  colum 

-  Extr ud  1  ng 
has  288  tran 
has  13  track 


c  -  Power  consumption  Is 


9651 

9822, 

10022 

10072 

101  14 

10270 

10503 

10585 

10718 

10755 

11  169 

11254 

11422 

11494 

11723 

11916 

11936 


12034 
14219 
14224 
14374 
14383 
14478 


5249  - 
5356  - 

5464  • 

5464  • 

5464  ■ 

5571  ■ 

5684  • 

5694  • 

5792  ■ 

5792  • 

6017  ■ 

6017  • 

6128  • 

6128  ■ 

6241  ■ 

6353  • 

6353  ■ 


c  -  Data-path 


6353 
7417 
7417 
7534 
7534 
7534 


Laying  out 
Organel 

Organe 

Organe 

Organe 

Organe 

Organe 

■  Organe 

■  Organe 

•  Organe 

•  Organe 

■  Organe 

■  Organe 

■  Organe 

•  Organe 

•  Organe 

■  Organe 
I  nter  na 

•  Laying 

■  Laying 

•  Laying 

■  Lay Ing 
Laying 
Laying 
are  2  . 


e 

1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
1  e 
b 
ou 
ou 
ou 
ou 
ou 
ou 
687 


-d  opt-c  stat  obj  cff  nologo) 

file  -  gc2 . mac 
y  from  -  /v 1 s 1 /macp 1 t/ 1 Ibr ary 
ef 1 n 1 1  tons 
va  1  s 
cros 
our ces 

dest I nat  tons 
labels 
sequencers 

flags,  data-path,  control,  and  ptns 
h  Is  4 
7 

s   " 

.obj  ftle 
gates 
ns 

straps 
s 1 stor s 
s 

0.057477  Watts 

data-path 
untt#  1  bit  1 

untt#  1  b It  0 
2  bit  1 

2  bit  0 

3  btt  1 

3  bit  0 

4  bit  1 

4  btt  0 

5  bit  1 

5  btt  0 

6  bit  1 

6  bit  0 

7  bit  1 

7  bit  0 

8  bit  1 
8  bit  0 


c  -  D 1  mens  ions 

17205,  8788  -  Outputlng 
c  -  Memory  used  -  408K 
c  -  Compilation  took  4.8 
c  -  Garbage  collection  t 
c  -  For  a  total  of  83  ga 


untt# 
unit* 
un  lt# 
un  tt# 
unlt# 
untt# 
untt# 
un  1t# 
untt# 
un  lt# 
un  tt# 
un  1 1# 
un  tt# 
unit* 
us  uses  3 
t  control 

flags 

r  I  ver 

wl  ng 

ske 1 eton 

pins 


tracks 


500  mm  by  1 
.elf  f 1 1e 


982500  mm 


23334  CPU  minutes 

ook  2.441111  CPU  minutes 

rbage  collections 


Gc2.scr 
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Stat 

Stat 

Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 


st* 
stl 

d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
stl 
stl 
d  - 
d  - 
stt 
d  - 
stt 
stt 
stt 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stt 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stt 
d  - 
stt 
stt 
stt 
stt 


c  -  for  project  stop 

c  -  options:  (herald  opt-d  opt-c  stat  obj  ctf  nologo) 
Reading  source  ftle  -  stop. mac 
Reading  library  from  -  /v 1 s 1 /macp 1 1/ 1  Ibr ar y 

-  Processing  definitions 

-  Evaluating  evals 

-  Expanding  macros 
1  -  Extracting  sources 
I  -  Extracting  destinations 
1  -  Extracting  labels 
1  -  Extracting  sequencers 
1  -  Extracting  flags,  data-path 
urn  control  depth  Is  5 

tes  Is  37   *• 


56 

56  - 

588 

588 

588 

68 

68 

68 

68 

68 


63 

74, 

877, 

878, 

961  , 

1088 

1094 

1102 

1  102 

1107 
c  -  Max  1m 
c  -  Numbe 
c  -  Data- 

2983,  18 

3104,  18 
c  -  Contr 

17705,  9 
c  -  C  Ircu 
c  -  Contr 
c  -  Power 

18256,  9 

18279, 

18773, 

18830, 

19001 , 

19075, 

19091 , 
c  -  Data 


control ,  and  p 1 ns 


ga 
ha 
Ou 
Ex 


19244 

21284 

21286 

21307 

21333 

21382 
c  -  0  t men 

24464,  1 
c  -  Hemor 
c  -  Comp 1 
c  -  Garba 
c  -  For  a 


r  of 
path 
85  - 
85  - 
ol  has 
477  -  E 
It  has 
ol  has 

consum 
790  -  L 
790  -  0 
01  13 
0113 
0220 
0220 
0220 
path 
0327 
1356  - 
1356  - 
1356  - 
1356  - 
1356  - 
s 1 ons  a 
2791  - 
y  used 
1  at  Ion 
ge  coll 

tota  1 


s  3  Units 

tput  t ng  .ob J  ftle 

trudlng  gates 

43  col umns 

xtrudlng  3traps 

268  trans  I  stor s 

14  tracks 

ptton  Is  0.054698  Watts 

aylng  out  data-path 

rganelle  un1t#  1  bit  1 

Organel le  unlt#  1  bit  0 

untt# 

un1t# 

untt# 

un  1 1# 


In 


Organel 1 e 

Organel le 

Organel 1 e 

Organel 1 e 

ternal  bus 

Laying 

Laying 

Laying 

Laying 

Laying 

Laying 


bit 
bit 
b  It 
bit 


2 
2 
3 
3 
uses  2  tracks 
control 
flags 
r  1  ver 
wl  ng 
sk  e 1 eton 
pins 
re  2.107500  mm  by  2.207500  mm 
Output  tng  .elf  file 
-  403K 

took  6.877223  CPU  minutes 
ectlon  took  3.587222  CPU  minutes 
of  123  garbage  collections 


out 
out 
out 
out 
out 
out 


Lc2. scr 
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Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 


stt 
stl 

d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
stl 
stl 
stl 
d  - 
d  - 
stt 
rd  - 
stl 
stl 
stl 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 
d  - 


c  -  for  p 
c  -  optlo 

65,  53  - 

74,  53  - 

898,  59S 

899, 

980, 

1  106 

1  113 

1118 

1118 

1121 
c  -  Max  1m 
c  -  Numbe 
c  -  Data- 

4032,  25 

4243,  25 
c  -  Contr 

25458,  1 
c  -  C  Ircu 
c  -  Contr 
c  -  Power 


596 
596 
68 
68 
68 
68 
68 


roject 
ns:  (h 

Read! 

Read  1 

-  Pro 

-  Eva 

-  Exp 
Ex 


6 
6  • 
6  • 
6  • 

6  - 
um 


Ex 
Ex 
Ex 
Ex 
con 


26808, 
27264, 
27788, 
27815, 
27841, 
27983, 
28111  , 
28292, 
28320, 
28349, 
28499, 
28634. 
28886, 
28920, 
29186, 
29220, 
29360, 
29497, 
29509, 
29521  , 
29532, 
29G02, 
30093, 
30290, 
30358, 
30551 , 
31072, 
31346, 
31388, 
31431  , 
31599. 
31766, 
31972, 
32001 , 
32031 , 
32188, 


r  of  g 
path  h 
50-0 
50  -  E 
ol  has 
2382  - 
It  has 
ol  has 
consu 
3048  - 
3272  - 
3612  - 
3612  - 
3612  - 
3727  - 
3727  - 
3845  - 
3845  - 
3845  - 
3965  - 
3965  - 
4082  - 
4082  - 
4313  - 
4313  - 
4313  - 
4430  - 
4430  - 
4430  - 
4430  - 
4430  - 
4671  - 
4794  - 
4794  - 
4919  - 
5171  - 
5296  - 
5296  - 
5296  - 
5421  - 
5421  - 
5545  - 
5545  - 
5545  - 
5671  - 


bS 

erald  op 

ng  sourc 

ng  libra 

cess  I ng 

lusting 

and  1  ng  m 

tract  1  ng 

tract  1 ng 

tract  1 ng 

tract  1  ng 

tract  1  ng 

trol  dep 

ates  Is 

as  10  Un 

utput 1 ng 

xtrud 1 ng 

63  col u 

Extr ud 

1208  tr 

27  trac 

mpt ton 

Laying 

Or gane 

Organe 

Or gane 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 

Organe 


t-d  op 
e  f  lie 
ry  fro 
def Inl 
eva  1  s 
acros 
sourc 
dest  1 
label 
seque 
flags 
th  Is 
53   „ 
Its 
.ob  J 
gates 
mns 

ng  str 
ans  1  st 
ks 

s  0.20 
out  da 


t-c  stat  obj  elf  nologo) 

-  b4 . mac 
m  -  /v 1 s 1 /macp 1 t/ 1 Ibrary 
t  Ions 
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Her 
Her 
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32331, 
32342, 
32354, 
32493, 
32505, 
32560, 
3291S, 
33125, 
33341  , 
33422, 
33983, 
34082, 
34297, 
34515, 
34601 . 
c  -  Data 
35348, 
41246, 
41742, 
41993, 
42015, 
42180, 
c  -  Dime 

49229, 
c  -  Memo 
c  -  Comp 
c  -  Garb 
c  -  For 


15671 

15671 

15671 

15800 

15800 

15800 

15930 

16060 

16196 

16196 

16459 

16459 

16590 

16722 

16722 

-path 

16992 

19921 

20059 

20197 

20197 

20197 

ns Ions 

23494  - 

ry  used 

1 latlon 

age  col 

a  total 


Organe 
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Organe 
Organe 
Organe 
Organe 
Organe 
Organe 
Organe 
Organe 
Organe 
Organe 
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nter  na 1 
Laying 
Laying 
Laying 
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Laying 
Laying 

are  5.7 
Output 
-  518K 
took  1 
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of  199 


lie 

1  1e 

1  1  e 
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1  1  e 

1  1  e 

1  1e 

1  1  e 
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1  1  e 
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1  1e 

1  1  e 

1  1  e 
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t# 
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8 

8 
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bit 
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b  It 
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ns 
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f    file 


125000    mm 


3.804167    CPU    minutes 
took    6.569723    CPU    minutes 
garbage    collections 
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;GRAY  CODE  to  BINARY  conversion  algorithm 
(program  gc2s  2 
( def  1  ground ) 

2  ph  la ) 

3  ph1b) 

4  ph  Ic  ) 

reset  signal  Input  5) 
1  np  si  gna 1  1  nput  6  ) 
bin  signal  output  7) 
8   power ) 

grycod 


(def 

(def 

(def 

(def 

(def 

(def 

(def 

( process 

msbs 

( cond ( (not 
(  Inp 
compl 

( cond ( ( not 
(  Inp 
nextb  1 1 

( cond ( ( not 
(  Inp 


0 


1 np ) ( setq 
<  setq 

1 np ) ( setq 
(setq 

1 np ) ( setq 

(  setq 


bin  (not  1np))(go  msbs)) 
bin  1np)(go  compl))) 

bin   lnp)(go  compl)) 

bin  (not  1np))(go  nextblt))) 

b1n(not  1np))(go  nextblt)) 
bin  1np)(go  compl)))   )  ) 


THIS  ALGORITHM  EXHIBITS  THE  GRAY  CODE 
DECODING  SCHEME  DONE  IN  THE  CONTROL  PATH. 
THE  ONLY  DATA  PATH  ORGANELLES  INSTANTIATED 
ARE  THOSE  ASSOCIATED  WITH  THE  SEQUENCER.  THE 
WIDTH  OF  THE  SEQURNCER  (2  BITS)  IS  DEFINED 
EXPLICITLY  IN  THE  PROGRAM  STATEMENT,  EVEN 
THOUGH  NO  ACTUAL  DATA  PATH  (AS  SUCH)  EXISTS. 
THE  IMPLICATION  IS  THAT  FSMs  CAN  BE  CREATED 
WITHOUT  AN  "ACTUAL  DATA  PATH". 
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d  - 
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d  - 
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st  1 
stl 


C  -  f 

c  -  o 
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995, 
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c  -  M 

c  -  N 

c  -  D 

2138 

2214 

c  -  C 

8365 

c  -  C 

c  -  C 

c  -  P 

8769 

8803 

9319 

9397 

9564 

9635 

9891 

1008 

1010 

c  -  D 

1015 

1186 

1  187 

1201 

1202 
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c  -  D 

1419 

c  -  M 

c  -  C 

c  -  G 

c  -  F 


or  p 
ptlo 
55  - 
55  - 

598 
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,  69 

,  69 

,  69 

,  69 

ax  1  m 

umbe 

ata- 

.  13 

,  13 

ontr 

.  46 

1  r  cu 

ontr 

ower 

.  48 

.  48 

.  51 

51 

52 

52 

54 

5 

5 

lata- 

.5,  5 


roject  gcs 

nsi  (herald  opt-d  opt-c  stat  obj  elf  nologo) 

Reading  source  file  -  gcs. mac 

Reading  library  from  -  /v 1 s 1 /macp 1 1/ 1  1  br ar y 

-  Processing  definitions 

-  Evaluating  eva 1 s 

-  Expanding  macros 

-  Extracting  sources 

2  -  Extracting  destinations 

2  -  Extracting  labels 

2  -  Extracting  sequencers 

2  -  Extracting  flags,  data-path,  control,  and  pins 

um  control  depth  Is  4 

r  of  gates  is  25    _, 

path  has  4  Units 

78  -  Outputlng  .obj  file 

78  -  Extruding  gates 

ol  has  29  col umns 

32  -  Extruding  straps 

1t  has  215  transistors 

ol  has  13  tracks 

sumption  Is  0.041979  Watts 
Laying  out  data-path 


8, 
1. 
1. 

4, 
3, 


Imen 
2,  7 

emor 
omp  1 
arba 
or  a 


h 
h 

con 
50  - 
50  - 
81  - 
81  - 
96  - 
96  - 
07  - 
518 
518 
path 
518 
353 
353 
469 
469 
469 
s  1  on 
428 
y  us 
lat  I 
ge  c 

tot 


Organel 1e  un1t#  1  bit  1 
Organelle  unit*  1  bit  0 
Organelle  unlt#  2  bit  1 
Organelle  unlt#  2  bit  0 
Organel le  unit*  3  bit  1 
Organelle  unit*  3  bit  0 

■  Organelle  unit*  4  bit  1 

■  Organelle  unit*  4  bit  0 
internal  bus  uses  3  tracks 

■  Laying  out  control 
flags 
r  I  ver 
w  1  ng 
ske 1 eton 
p  1  ns 


out 
out 
out 
out 
out 


-  Laying 

-  Laying 

-  Laying 

-  Laying 

-  Lay  1 ng 
s  are  1 . 742500  mm  by  1 

-  Output  1 ng  .elf  file 
ed  -  377K 

on  took  4 .00861 1 
ol lection  took  2 
a  1  of  7  1  garbage 


942500  mm 


CPU  minutes 
098333  CPU  minutes 
co 1 ] ect 1 ons 
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;DPLC2.MAC 

(program  dplc2    5  ;there  aro    5  outputs 

( def  1 3  power ) 

(def  1   ground) 

(def  2  phla  > 

(def  3  ph1b> 

(def  4  phlc) 

(def  c  signal  input  5)     ;note  use  of  Boolean  Inputs 

(def  tl  signal  Input  6) 

(def  ts  signal  Input  7) 

(def  reset  signal  Input  14) 

(def  1c  port  output  (  8  9  10  11  12))  ;and  Integer  outputs 

(process  1 1ght_contro1 1 er   0  jstlpulates  FSM  architecture 
hg  •  ;HIGHWAY  GREEN  state 

(cond( (notfand  c  tl  )  )   ;1f  TRUE, set  these  outputs 

(setq  1c  4) 

(go  hg ) > 

(t  (setq  lc  5) 

(go  hy)  )  ) 

hy  ;HIGHUAY  YELLOW  state 

( cond( ( not  ts ) 

(setq  lc  12) 

(go  hy  5 ) 

(t  (setq  lc  13) 

(go  fg) >   ) 

fg  -.FARMROAO  GREEN  state 

( cond( ( not (or  t1(not  c))) 

( setq  1 c  16) 

(go  fg) ) 

(t  (setq  lc  17  ) 

(go  fy> )   ) 

fy  ;FARMROAD  YELLOW  state 

( cond( ( not  ts ) 

(setq    lc     18) 

(go    fy) ) 

(t  (setq    lc    19) 

(go    hg) )        )     ) > ) ) 
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Hera 

Hera 

Hera 

Hera 

Stat 

Hera 

Stat 

Stat 

Stat 

Stat 


stlc'-  for  project  dplc2 

stlc  -  options!  (herald  opt-d  opt-c  stat  obj  ctf  nologo) 
d  -  62,  55  -  Reading  source  file  -  dp1c2.mac 
d  -  68,  55  -  Reading  library  from  -  /v 1 s 1 /macp  1 1/ 1  Ibrar y 
d  -  905 ,  604  -  Processing  definitions 
d  -  906,  504  -  Evaluating  evals 
604  -  Expanding  macros 
702  -  Extracting  sources 

dest 1 nat Ions 

labels 

sequencers 

flags,  data-path,  control,  and 


702  - 

702  - 

702  - 

702  - 


906, 

989, 

1107 

1  1  1  1 

1114 

1114 

1117 
stlc 
stlc 
stlc 

d  -  2277,  1498 
d  -  2410,  1498 
stf.c  -  Control 
d  -  8931,  4725 
stlc  -  C  Ircult 
stlc  -  Control 


Extracting 
Extract  1 ng 
Extracting 
Extracting 

-  Maximum  control  depth  Is  5 

-  Number  of  gates  Is  34 

-  Data-path  has  4  Units 

-  Outputlng  .obj  file 

-  Extruding  gates 
has  40  columns 

-  Extruding  straps 
has  346  transistors 
has  17  tracks 


p  Ins 


stlc  -  Power  consumption  Is  0.056716  Watts 


e  un It*  1  bit  4 


d  -  9580,  5048  - 

d  -  9922,  5267  - 

d  -  10156,  5379  - 

d  -  10207,  5379  - 

d  -  10375,  5498  - 

d  -  10533,  5607  • 

d  -  10859,  5718  • 

d  -  1 1242.  5928  - 

d  -  1  1266,  5928  - 

d  -  1  1291  ,  5928  • 

d  -  11316,  5928  - 

d  -  11552,  6042  ■ 

d  -  11590,  6042  ■ 

d  -  1 1722,  6148  ■ 

d  -  1 1748,  6148  ■ 

d  -  11777,  6148  • 

d  -  12052,  6272  ■ 

d  -  12068,  6272  ■ 

d  -  12080,  6272  • 

d  -  12204,  6383  ■ 

d  -  12216,  6383  ■ 

stlc  -  Data-path 

d  -  12313,  6383  - 

d  -  14457,  7438  - 

d  -  14461  ,  7438  - 

d  -  14506,  7438  - 

d  -  14521  ,  7438  - 

d  -  14578,  7438  - 

stlc  -  D 1  mens  1 ons 

d  -  18275,  9184  - 

stlc  -  Memory  used 

stlc  -  Compilation  took  5.164444 

stlc  -  Garbage  collection  took  2. 

stlc  -  For  a  total  of  86  garbage 
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Laying 
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Output  lng  .elf  file 
-  414K 
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out  control 

flags 
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4  60000  mm 


CPU  minutes 
586667  CPU  minutes 
col lect Ions 
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APPENDIX  C 


CHAPTER  V  LISTINGS 


Script  star 
X  /vlsl/ber 
Crystal ,  v . 
i  build  sp 1 
[0:00. 5 u  0i 
t  1 np  1  ts  c 
Unknown  com 
i  Inputs  c 
[0:00.0u  0: 
i  outputs  s 
[0:00. 0u  0: 
:  delay  phi 
Mark  1 ng  tra 
Setting  Vdd 
Setting  GND 
(  198  stages 
[0:00.5u  0: 
:  critical 
Node  h!0  Is 
.  .  .th 
50  Is  d 
.  .th 
.  .th 
156  Is 
.  .th 
Is  d 
.  .th 
Is  d 
.  .th 
Is  d 
.  .th 
.  .th 
ph  la  Is 
[0:00. lu  0: 
t  critical 
[0:00. 0u  0: 
»  c 1  ear 
[0:00.0u  0: 
:  del  ay  ph 1 
Mar k 1 ng  tra 
Setting  Vdd 
Setting  GND 
(126  stages 
[0:00. 3 u  0 
:  critical 
Node  hl0  Is 
.  .  .th 
50  Is  d 
.  .th 
.  .th 

Is 
.  .th 
Is  d 
.  .th 
Is  d 
.  .th 
Is  d 
.  .th 


ted 
k85/ 
2 

ac  1  s 

00.2 

tl  t 

mand 

tl  t 

00.  1 

t  hi 

00.0 

a  0 

ns  1  s 

to 

to 

exa 

00.  1 


on  Sat 
b  In/cr 

.  s  f  m 
s  31k] 
s  ph  1a 

:  Inp  1 
ph  1  a 
40k] 
hi  1 
40k] 


3 
S 
0 
S 

-1 

tor 
1  .  .  . 
0.  .  . 
m 1 ned . 
s  47k] 


Jun  15  15:14:27  1985 
ystal  splacls.slm 


phlb 
ts 
phlb 


f 10  f 1 1 


f  low, 


73 


41 


27 


dr  1v 

rough 
r  1  ven 
rough 
rough 
dr  1  ve 
rough 
r  1  ven 
rough 
r  1  ven 
rough 
r  1  ven 
rough 
rough 
dr  1  v 
00.0s 

-g  sp 

00.  Is 


en  h  1 
fet 
1  ow 
fet 
fet 

n  h  1g 
fet 
1  ow 
fet 
high 
fet 
1  ow 
fet 
fet 

en  h  1 
47k] 

laphl 
52k] 


gh  at  26.93ns 

at  (154,  -155)  to  Vdd  after 

at  23.99ns 

at  (  158,  -106)  to  93 

at  (156,  -59)  to  GND  after 

h  at  18.05ns 

at  (5,  -61 )  to  Vdd  after 

at  9.33ns 

at  (69,  -113)  to  GND  after 

at  6.31ns 
at  (75,  -124)  to  Vdd  after 
at  1.95ns 

at  (76,  -153)  to  4 
at  (119,  -126)  to  GND  after 
gh  at  0.00ns 


156 


73 


41 


27 


00.0s  52k] 

b  0  -1 

ns I stor  f 1 ow. . . 

to  1 . .  . 

to  0. . . 

exam  1 ned . ) 
00.0s  52k] 

driven  high  at  32.06ns 
rough  fet  at  (154,  -155)  to  Vdd  after 
riven  low  at  29.11ns 
rough  fet  at  (158,  -106)  to  93 
rough  fet  at  (156,  -59)  to  GND  after 
driven  high  at  23.17ns 
rough  fet  at  (5,  -61)  to  Vdd  after 
riven  low  at  14.46ns 

rough  fet  at  (69,  -113)  to  GND  after 
riven  high  at  11.43ns 

rough  fet  at  (75,  -124)  to  Vdd  after 
r 1 ven  1 ow  at  6.97ns 
rough  fet  at  (76,  -153)  to  4 


Crystal  Timing  Analysis  o-f  -Cis  PLA 


59 


ph 
10:00 . 
j  cr  It 
[0:00. 
:  quit 
[0:01 . 
X  ~D 
scr  1  pt 


...through    fet    at    (119,    -126)    to    GND    after 

Is    driven    high    at    2.67ns 
...through    fet    at    (118,     -106)    to    88 
...through    fet    at    (117,     11)    to    Vdd    after 
1b     Is    driven    high    at    0.00ns 
lu    0:00. Is    52k] 
leal     -g    sp  laph  lb 
lu    0:00.1s    52k3 

7u    0:00.5s    52k]    Crystal    done. 

done    on    Sat    Jun    15    15:16:58    1985 
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Scrfpt  s 
X  /vlsl/ 

Crysta 1 , 
i  build 
[0:00. 8u 
:  Inputs 
[0:00.0u 
:  output 

[0J00.0U 

:  del  ay 
Mar k 1 ng 
Setting 
Setting 
(21  stag 
[0:00. 7u 
s  cr It  1c 
Node  228 


260 

533* 

ph  la 
[ 0 1 00 . 1 u 
t  critic 

[0J00.0U 

:  clear 
C0:00.0u 
:  del  ay 
(221  sta 
[0:00. 8u 
:  critic 
Node  st 


tarte 
berk8 

v. 2 
It. si 

0100 

ph  la 

0:00 
s  st 

0:00 
ph  1  a 
trans 
Vdd  t 
GND  t 
es  ex 

0:00 
al 

Is  d 
.thro 
.thro 
.  thro 
.thro 
Is  dr 
.thro 
Is  dr 
.thro 

Is  d 

0:00 
al  -g 

0:00 


d  on 
5/bl 

m 

.2s 
phi 
.Is 
f  10 
,0s 

0  -1 

isto 
o  1  . 

0  0. 
am  1  n 

Is 

rive 

ugh 

ugh 

ugh 

ugh 

1  ven 
ugh 
1  ven 
ugh 
r  1  ve 
.0s 

ltp 
.  Is 


Sat  Jun  15  15:18:00  1985 
n/crysta 1  1 t . s 1m 


39k] 

b  c  tl  ts 

48k] 

f 11  hl0  hi  1 

48k] 

r  f 1 ow . . . 


ed.  > 

50k] 


n  low  at  10.1Sns 
fet  at  (569,  453) 
fet  at  (568,  570) 
fet  at  (456,  538) 
fet  at  (480,  537) 

high  at  4.92ns 
fet  at  (416,  930) 

1 ow  at  0.75ns 
fet  at  (365,  942) 
n  high  at  0.00ns 
50k] 
hla 
55k] 


373 
396-' 
364*' 
76*  i 

190* 

181  * 

535* 

ph  lb 
[0:00. lu 
:  cr Itlc 
[0:00. lu 


0:00.0s 
ph lb  0  - 
ges  exam 

0:00.0s 
al 

Is  dr 1 ve 
. through 
1s  dr 1 ve 
. through 
Is  drive 
. through 
Is  drive 
. through 
s  dr 1 ven 
.  through 
. through 
. through 
Is  drive 
.  through 
. through 
Is  drive 
. through 
Is  drive 
.  through 

Is  dr  1  v 

0 :  00  .  1  s 
al  -g  It 

0:00.0s 


55k] 
1 
Ined. ) 

60k] 

n  1  ow 
fet  a 

n  high 
fet  a 

n  1  ow 
fet  a 

n  high 
fet  a 
1  ow  a 
fet  a 
fet  a 
fet  a 

n  high 
fet  a 
fet  a 

n  high 
fet  a 

n  1  ow 
fet  a 

en  h  1  g 
60k] 

ph  lb 
65k] 


at  135 
t  (911 

at  13 
t  (893 
at  131 
t  (866 

at  12 
t  (877 
t  108. 
t  (584 
t  (478 
t  (472 

at  15 
t  (479 
t  (666 

at  4  . 
t  (541 
at  0.7 
t  (490 
h  at  0 


.82ns 
,  583) 

3.89ns 
,  510) 
,02ns 
,  570) 
3.52ns 
.  510) 
50ns 
,  411  > 
,  435) 
,  415) 
.03ns 
,  406) 
,  930) 
91ns 
,  930) 
5ns 
942) 
00ns 


to  262 

to  88 

to  41  1 

to  GND  after 

to  Vdd  after 

to  GND  after 


to  GND  after 
to  Vdd  after 
to  GND  after 
to  Vdd  after 

to  88 

to  201 

to  GND  after 

to  163 

to  Vdd  after 

to  Vdd  after 

to  GND  after 
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Script  started  on  Thu  Jun  13  23:30:02  1985 

X   powest  -p  <  It. aim 

gamma=0.4V**.5,  tox-9e-08m,  u0=0.08m**2/V-s 

vdd=5V,  vtd=-3.5V,  vte=0.8V,  vsb=2V 

#devs    Pdc_avg  (W)      Pdc_tnax  (W)      type 

0        0.000000         0.000000         enhancement  pullups 

20       0.011980         0.023959         depletion  pullups 

15       0.030536         0.061072         special  depletion  pullups 

35       0.042516         0.085032         TOTAL 

X   ~D 

script  done  on  Thu  Oun  13  23:31:12  1985 
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X/vls  t /berk 85 /b In/ crystal  stop . s 1m 

: Inputs  c  t 1  ts  rst 

:outputs  st  hl0  nil  fl0  fll 

: set  1  ph 1  a  ph 1 c 

••delay  phlb  0  -1 

tcrttlcal  <9.6ns) 

t  c 1  ear 

t set  1  ph 1  a 

:delay  ph lb  -1  0 

:delay  ph 1c  -1  0 

tcrttlcal  (5S.67ns) 

: c 1  ear 

: set  0  phlb  phlc 

:del ay  ph 1a  -  1  0 

:crlt1cal  (17.55ns) 

: c 1  ear 

.•set  0  phlb  phlc 

: delay  phla  0  -1 

scrltical  (54.S3ns) 

: c 1  ear 

: set  1  ph 1  a 

t  set  0  phlb 

:delayph1c0-l  „    ■ 

tcrttlcal  (3G3.52ns) 

:qu1t 
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APPENDIX  D 
CHAPTER  VI  LISTINGS 


Stat 

Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Stat 
Stat 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Hera 
Stat 
Hera 
Hera 
Hera 
Hera 
Hera 
Hera 
Stat 
Hera 
Stat 
Stat 
Stat 
Stat 


stlc  -  for  project  haml5.4 

stlc  -  options:  (herald  opt-d  opt-c  stat  obj  elf  nologo) 

d  -  59,  52  -  Reading  source  file  -  haml5.4.mac 

d  -  78,  52  -  Reading  library  from  -  / v 1 s  1 /macp  1 1/ 1  Ibr ary 

d  -  890,  591  -  Processing  definitions 

d  -  894,  591  -  Evaluating  evals 

d  -  980,  591  -  Expanding  macros 

d  -  2822,  1405  -  Extracting  sources 

d  -  2982,  1511  -  Extracting  destinations 

d  -  3015,  1511  -  Extracting  labels 

d  -  3015,  1627  -  Extracting  sequencers 

d  -  3131,  1627  -  Extracting  flags,  data-path,  control,  and  pins 

stlc  -  Maximum  control  depth  Is  7 

stlc  -  Number  of  gates  Is  140 

stlc  -  Data-path  has  0  Units 

d  -  9964,  4968  -  Outputlng  .obj  file 

d  -  10373,  4968  -  Extruding  gates 

stlc  -  Control  has  155  columns 

d  -  586415,  233036  -  Extruding  straps 

stlc  -  Circuit  has  715  transistors 

stlc  -  Control  has  42  tracks 

stlc  -  Power  consumption  Is  0.160860  Watts 

d  -  589965,  234452  -  Laying  out  data-path 

stlc  -  Data-path  Internal  bus  uses  0  tracks 

d  -  589967,  234452  -  Laying  out  control 

d  -  599196,  239812  -  Laying  out  flags 

d  -  599197,  239812  -  Laying  out  river 

d  -  599206,  239812  -  Laying  out  wing 

d  -  599281,  239812  -  Laying  out  skeleton 

d  -  599325,  239812  -  Laying  out  pins 

stlc  -  Dimensions  are  5.137500  mm  by  4.005000  mm 

d  -  606259,  242522  -  Outputlng  .elf  file 

stlc  -  Memory  used  -  529K 

stlc  -  Compilation  took  168.593902  CPU  minutes 

stlc  -  Garbage  collection  took  67.456947  CPU  minutes 

stlc  -  For  a  total  of  1805  garbage  collections 


Haml5dc. scr 
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